opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
7 stars 3 forks source link

Speed up MSSQL tests #922

Open evansd opened 1 year ago

evansd commented 1 year ago

The MSSQL tests are lightning fast compared to Spark, but still a bit sluggish when compared to the SQLite ones:

$ pytest tests/spec/ -k 'mssql and execute'
114 passed, 814 deselected in 12.87s

$ pytest tests/spec/ -k 'sqlite and execute'
114 passed, 814 deselected in 1.40s

There are probably things we can do here to speed things up. One would be to take Django's approach of running each test inside a transaction which is rolled backed rather than committed so as to avoid syncing to disk. Note that for [this test], and possibly others we'd need a way of opting out of the transaction handling similar to Django's TransactionTestCase.

Another, simpler, option might be finding some way of disabling fsync entirely for MSSQL (equivalent to setting fysnc=off in Postgres). There are no durability concerns at all with test data.

inglesp commented 1 year ago

Note that we can't directly apply Django's approach, because we don't know the set of tables + schemas ahead of time (unlike in Django where there's a single schema). But we could do this for the spec tests, setting up the tables at the start of the run, and running each test in a transaction.

evansd commented 1 year ago

Sadly there doesn't seem to be a simple config option for this in MSSQL: https://dba.stackexchange.com/questions/45337/sql-server-fsync-options

Maybe we could persuade Docker to mount the data directory using tmpfs or something like that.

evansd commented 2 weeks ago

Getting MSSQL to use tmpfs for its data directory ought to be as simple as adding this to the containers.run_bg() arguments:

tmpfs={
    "/var/opt/mssql": "",
},

However MSSQL doesn't currently support being run out of tmpfs:

There's a ridiculous loopback device workaround in that ticket which wouldn't be suitable for us in general but does at least let us estimate the benefits. Unfortunately they're not that great: running just the MSSQL spec tests using:

pytest tests/spec -k mssql

The runtime (with a warm container) drops from 37 seconds to 29 seconds, which doesn't seem particularly earth-shattering.