Open evansd opened 1 year ago
Note that we can't directly apply Django's approach, because we don't know the set of tables + schemas ahead of time (unlike in Django where there's a single schema). But we could do this for the spec tests, setting up the tables at the start of the run, and running each test in a transaction.
Sadly there doesn't seem to be a simple config option for this in MSSQL: https://dba.stackexchange.com/questions/45337/sql-server-fsync-options
Maybe we could persuade Docker to mount the data directory using tmpfs
or something like that.
Getting MSSQL to use tmpfs
for its data directory ought to be as simple as adding this to the containers.run_bg()
arguments:
tmpfs={
"/var/opt/mssql": "",
},
However MSSQL doesn't currently support being run out of tmpfs
:
There's a ridiculous loopback device workaround in that ticket which wouldn't be suitable for us in general but does at least let us estimate the benefits. Unfortunately they're not that great: running just the MSSQL spec tests using:
pytest tests/spec -k mssql
The runtime (with a warm container) drops from 37 seconds to 29 seconds, which doesn't seem particularly earth-shattering.
The MSSQL tests are lightning fast compared to Spark, but still a bit sluggish when compared to the SQLite ones:
There are probably things we can do here to speed things up. One would be to take Django's approach of running each test inside a transaction which is rolled backed rather than committed so as to avoid syncing to disk. Note that for [this test], and possibly others we'd need a way of opting out of the transaction handling similar to Django's
TransactionTestCase
.Another, simpler, option might be finding some way of disabling fsync entirely for MSSQL (equivalent to setting
fysnc=off
in Postgres). There are no durability concerns at all with test data.