Open matehat opened 5 months ago
Sorry for the slow response. To diagnose this further, it would be good to confirm whether this appears to corrupt the database in any way (e.g. if you close the other tabs, is it functional again or did we mess up locks causing permanent problems).
I will also try to reproduce this independently, my best guess at the moment is that there's an issue with locks possibly being held to long (but that should indicate a locked error instead of a generic disk I/O error).
No problem at all.
We'll investigate on our end to get a clearer picture of what makes the DB work again and what doesn't, since up to now people have been doing all sorts of things to mitigate, like clearing all cache, or restarting the browser.
Also getting these sorts of errors from time to time:
SqliteException(11): while selecting from statement, database disk image is malformed, database disk image is malformed (code 11)
Causing statement: SELECT * FROM "inbox_data" WHERE "discussion_id" = ?;
On a client whose logs included
Using WasmStorageImplementation.sharedIndexedDb
Even more intriguingly, the sort of query that this is means that the database has successfully opened before, and the app was just doing it's normal operations. Also worthy of note, to detect these sorts of errors beforehand, right after opening the database and before signaling it as ready to be used, I perform a SELECT * FROM X LIMIT 1
on each table to make sure the database is really readable, and delete+recreate if that fails. This means that the first read worked on the table, and after some time, it stopped working, saying it's malformed.
I'm beginning to wonder how safe sharedIndexedDB is to multi-tab usage (we've had to move away from opfsLock because of issues the security headers were causing). Do I have to do a coordination layer to prevent multiple tabs from using the same database at the same time?
I'm quite a bit perplexed, because a non-trivial number of users are reporting that the app becomes unresponsive (with corresponding error reports related to SQLite) and I'm having a hard time understanding how I can make our app resilient to IO failures at that level.
I'm beginning to wonder how safe sharedIndexedDB is to multi-tab usage (we've had to move away from opfsLock because of issues the security headers were causing)
That shouldn't be necessary, sharedIndexedDb
will host the database in a single shared worker that all tabs connect to. So the database is not actually opened multiple times.
One caveat with IndexedDB is that it's asynchronous, whereas sqlite3 expects a synchronous file system. To work around this issue, we're loading the entire database into memory when opening the database and we then start issuing writes asynchronously after updating a chunk in-memory. This generally works, but is really unsafe if any other tab is opening the database outside of the shared worker - but if they're all using sharedIndexedDb
, that shouldn't be an issue unless you're sometimes opening the database in a different way.
I'm also not understanding the corruption issue (especially since you're checking the table on startup) - since the entire copy is in-memory, it seems unlikely that a corruption is possible there. Even if that took place due to writes from different tabs, they'd only see the corruption after re-opening the database, not in the middle of a connection.
I wonder how it works when 2 tabs are on the same web page on Android Chrome where shared workers are not supported . I guess we could expect that only one tab is active at a time but is there a way to 'reload' in memory for other workers if the content is modified in one tab and loaded in another?
For Chrome on Android it's a problem indeed, but sharedIndexedDb
wouldn't get chosen there, it would be unsafeIndexedDb
.
There's no reload functionality at the moment, but if you use an explicit synchronization wrapper e.g. with the web locks API, you could explicitly close and re-open the database to make it fetch the data again. That's probably not any less efficient that a full reload would be either way.
Just to add more info, I'm also seeing:
SqliteException(26): while executing, file is not a database, file is not a database (code 26)
Causing statement: CREATE INDEX IF NOT EXISTS processed_operations_per_db_idx ON processed_operations(operation_id DESC, db_name);, parameters:
When our users use multiple tabs at the same time, we get lots of error reports like this:
coinciding with hangs in our web app, in the form of DB queries not completing (and most if not all of our users are using the opfsLock implementation).
Do you know how we can investigate the root cause of this?
(we're using the latest Drift version)