Closed bfollington closed 1 year ago
Well, this is terrifying. I'll take a look soon, but just off the cuff this looks like it is an error originating in the bowels of our database dependency Sled.
We should investigate whether we can trigger this from the app by rapidly spamming write, follow, unfollow, save, sync etc. operations... since that's the best guess I have for what triggered the instance of this I saw.
I accidentally reproduced this in a simulator:
2023-06-19 09:21:13.754066-0700 Subconscious[3444:47377278] [app] [action] failIndexOurSphere("Foreign Error: Read corrupted data at file offset None backtrace ()")
2023-06-19 09:21:13.754149-0700 Subconscious[3444:47377278] [app] msg="Failed to index our sphere to database" error="Foreign Error: Read corrupted data at file offset None backtrace ()"
Definitely seems like an urgent total-data-loss kind of bug...
Been digging a little deeper. I was able to reproduce this deliberately by rapidly adding follows and clicking "sync" multiple times. Logs: https://gist.githubusercontent.com/cdata/a3e1331636ddcaba5abd9275dcc8ff5e/raw/bfeb8e918705cbcaf64986bf3a2265ac9cd26d59/subconscious.log
A new lead: peering at the logs, I noticed that Subconscious appears to be re-initializing Noosphere in the midst of a lot of synchronization tasks.
This seems like a plausible surface area for database corruption.
Putting aside the fact that Noosphere should not be re-initialized this way, I will offer: we should make Noosphere more resilient to this case (that is, disposing of Noosphere should ensure that cleanup is performed).
That having been said: I think we need to do a pass in Subconscious to make sure that Noosphere constructs aren't being pathologically de-initialized and re-initialized during the lifetime of the user session.
cdata: @gordon FYI I think Noosphere is being unnecessarily de-initialized and re-initialized during the normal course of app state change https://github.com/subconsciousnetwork/subconscious/issues/693#issuecomment-1607965007
cdata: It's usually preceded by a log suggesting the gateway URL has changed (even though it never changes), so that might be an incorrectly handled state change
gordon: Aha! Yes, we have to re-initialize Noosphere whenever the gateway URL changes. So we could be sending that message when we should not be.
Closing this for now, I think we've fixed this but the future-proof solution is https://github.com/subconsciousnetwork/noosphere/issues/455
A mysterious error that spontaneously occurred during development. Appears that somehow the on-disk data has been corrupted. After profiling disk use in Instruments and examining the files on disk it seems like the app can read data as expected, but the data itself is malformed.
Full dump of the application data is attached, along with the (mostly unhelpful) logs:
dump.zip