Tolerance for diverging schemas from new vs upgraded databases is a trap waiting to spring

The database upgrade test allows new and upgraded database schemas to differ by the order of columns in tables (actually, it allows them to differ completely because the logic for allowing just this one kind of divergence would be too complicated, I guess). However, the order of columns affects the behavior of SQL statements. Allowing this divergence to exist and persist will probably eventually cause someone to write some database code that works with one kind of database but not the other (most likely it will work with new databases and fail when someone tries to upgrade their old database to a new release). This could be bad (some kind of type error that leads to the server to crash or otherwise become inoperable). Or it could be catastrophic (no type error, just the wrong values used in a way that leads to some user-facing misbehavior - most likely wormhole connections not being possible).

There could also be more subtle consequences such as differing query optimization results, leading to different performance on different deployments. The worst case for this would be that there's a bug in the query optimizer which causes it to produce incorrect results against one schema but not the other.

The simple fix would be to accept that wanting to support in-place upgrades means column order won't necessarily be "sensible" and make future versions of the schema strings put columns in the same order the upgrader will put them in. A more complicated fix would be to version tables themselves and copy data over to the new tables (so, no longer in-place). The new tables can then have whatever column order is desirable.

Either way, the assertion in the upgrade test can be re-enabled to ensure there is no schema divergence for these two cases.

I took a look at the DB calls, and I didn't find any that are dependent upon the order of columns (all of our Row objects are dereferenced with strings, not integers). So I guess my philosophy has been that our code should be insensitive to ordering, and upgraded DBs are allowed to have a different column order than "native" DBs.

I'd be happy with some coding standards that remind future authors to not depend upon column ordering.

I'm willing to believe that some SQLite internal performance behavior could change if the columns are ordered differently, but I think I'd consider it a bug: the index specifies column names, not integers.

I tried re-enabling the assertion at the end of wormhole.test.test_database.DB.test_upgrade, and it failed (as expected). The differences between upgraded and native were:

the new for_nameplate column was added to the end of the mailbox_usage table, rather than near the beginning
the upgraded schema omits a newline after the last column, the native schema includes that newline
the upgraded schema doesn't modify the comment in the version table (--contains one row, set to 2 vs 3)
the order of the CREATE INDEX statements has changed

Using self.assertEqual is a pretty sad way to compare the two schemas.. it'd be nicer to find some sort of SQL parser, and then compare the resulting syntax trees. This might let us ignore the comments, and the ordering differences in the CREATE INDEX statements, while still noticing the ordering differences in the table columns.

So.. what should we do?

Splitting out the server to a new repo (#240) will make it easier to split the usage data into a separate DB from the operational data (which includes all the active nameplates, mailboxes, and messages). With that complete, it might be easier to completely wipe the operational DB each time the server is upgraded, instead of trying to upgrade the DB in place, with the rule being that you must wait for the server to be idle before you upgrade it.

But, it'd be safer to enable upgrades. I can imagine using the second approach you described, so upgrade-v2-to-v3.sql would rename the old tables, create the new ones (with the same column order as v3.sql), then copy over all the rows, then delete the old ones. Does SQL provide enough control to do this?

magic-wormhole / magic-wormhole-mailbox-server

Tolerance for diverging schemas from new vs upgraded databases is a trap waiting to spring #10