The change counter in the sqlite header isn't *always* updated

pkhuong commented 3 years ago

I see "Once we’ve grabbed a readwrite transaction, we can read some bytes from the file which represent SQLite’s “change counter”. If that counter is the same as what we had when we requested a write, it’s safe to write!" in the announcement post.

Unfortunately, I discovered that's only mostly true while working on https://github.com/backtrace-labs/verneuil-wip. Verneuil runs sqlite's tcl regressions tests with assertions enabled in its replicating VFS, and assertion failures in the bitwise replication check uncovered at least two scenarios where sqlite can change the data on disk without updating the header's version counter:

Maybe benign for absurd-sql, but rollback doesn't always restore all garbage (unused) bytes on a page: sqlite stages writes in uninitialised malloced arrays, so non-deterministic bytes from previous allocations can leak through to persistent storage (we work around that by building sqlite with a macro that redirects malloc to calloc).
The page-cache flushing (https://github.com/sqlite/sqlite/blob/master/test/cacheflush.test / https://www.sqlite.org/c3ref/db_cacheflush.html) tests don't roll back (AFAICT), but manage to cause physical changes without updating the version counter. I believe the writes happen to pages that aren't semantically visible to other connections until the SQL transaction commits, so that's safe for sqlite, which only uses the page counter to drop its in-memory page cache. However, this does mean that the change counter cannot be used to detect all writes to the underlying storage.

It probably makes sense for absurd-sql to manage its own dedicated change-tracking key-value entry, or maybe simplify the locking mechanism to jump over the "reserved/pending" states, and only have shared & exclusive.

jlongster commented 3 years ago

Oh wow, this is super interesting, thanks @pkhuong!

or maybe simplify the locking mechanism to jump over the "reserved/pending" states, and only have shared & exclusive.

That's actually how it works right now! See https://github.com/jlongster/absurd-sql/blob/master/src/indexeddb/worker.js#L423 and the comment above it. Specifically https://github.com/jlongster/absurd-sql/blob/master/src/indexeddb/worker.js#L388

When a RESERVED lock is requested we go ahead and upgrade the lock to a full write lock (EXCLUSIVE). However, the issue is still that a different IDB readwrite may run between requesting a readwrite transaction and actually starting it. I don't see how only having shared and exclusive solves the issue?

I don't think the garbage data is a problem -- as long as the filesize is correct and everything else in the file is in tact, I don't see how that would effect anything.

The page-cache flushing (https://github.com/sqlite/sqlite/blob/master/test/cacheflush.test / https://www.sqlite.org/c3ref/db_cacheflush.html) tests don't roll back (AFAICT), but manage to cause physical changes without updating the version counter. I believe the writes happen to pages that aren't semantically visible to other connections until the SQL transaction commits, so that's safe for sqlite, which only uses the page counter to drop its in-memory page cache. However, this does mean that the change counter cannot be used to detect all writes to the underlying storage.

Flushing to disk in the middle of a transaction is something I want to understand more. Does it mainly happen in a huge write transaction where it can't hold all the data in memory?

I think this should be fine because we grab the readwrite transaction on a RESERVED lock. We'll lock the whole db for the entire time a SQLite transaction is open, so it's OK if it writes in the middle of a transactions.

And now I understand your point about locks. Because we are already doing that, we sidestep this issue. A connection can never write in the middle of a transaction without already having a lock, which we have because we start it on RESERVED.

Thanks for reporting this, it feel great to make sure we've covered the edge cases!

pkhuong commented 3 years ago

Simplifying the locking protocol does seem to help make correctness more robust. However, I don't know if the list of conditions above is exhaustive. A dedicated out-of-band change counter might help you find additional issues.

jlongster commented 3 years ago

I think locking the file for a whole time that a BEGIN TRANSACTION and COMMIT happens should be robust -- if things can change underneath that it seriously undermines the transactional semantics. The semantics are strong enough here that I think it's OK to rely on them.

I will keep this in mind though. It's definitely important we get this right, and I'm thinking about how to test it. I'm open to having our own change counter as well, but I want to have clear reasons to do so. I'll leave this open and keep thinking about it

jlongster / absurd-sql

The change counter in the sqlite header isn't always updated #10

jlongster / absurd-sql

The change counter in the sqlite header isn't *always* updated #10

The change counter in the sqlite header isn't always updated #10