How should commit() failures be surfaced when freezing / unloading

spanicker commented 6 years ago

It has been suggested that commit() is a good fit for IDB transactions in "end-of-life" eg. when the page is freezing (per Web Lifecycle) or unloading (user navigates away). In these cases oncomplete and onabort will not fire. If the transaction fails it would be useful to surface this to the app, on next IDB access. How should such failures be surfaced to the app (in onresume or new load)?

@inexorabletash @philipwalton

inexorabletash commented 6 years ago

cc: @dmurph

inexorabletash commented 6 years ago

Just enumerating some of the considerations here...

We have an existing, non-standard means of reporting data loss on upgrade (e.g. when the database was corrupted and needed to be reset). That's not relevant here, but it's worth looking at to see what we did (an enum/message pair, just like exceptions) and what is problematic about it (delivered on the next open by any code in the origin)
Does this information need to be associated with an origin, database, or even more finely scoped? For example, an origin could have two services A and B, which use separate databases ('db-a', 'db-b'). If A does a write-only/commit tx on freeze/unload which fails, does B get a notification when it opens 'db-b' ? That seems less than ideal. This almost hints at wanting to store a per-database transaction history of some sort.
Transactions can fail to commit for a fixed number of reasons:
- Abort due to bad requests, e.g. trying to add() the same key twice, or put() with the same index key with a uniqueness constraint.
- Explicit abort() call from script
- Uncaught exception in request's success/error handler
- I/O error (actual failure to write to disk, e.g. disk detached, or other OS/hardware failure)
- Quota exceeded
- User agent crash
In all cases, if script is still running, an AbortError with details would be fired at the transaction with details. (Obviously in the UA crash case, script is not still running!)

...

To flesh out the scenario developers are concerned about:
- The page allows input of data by the user (e.g. issue comments, mail draft, document edits, etc)
- The data is usually stored in browser memory and bidirectionally syncs with a server, but is also persisted to the local database on a regular basis for offline use
- A freeze event comes in - the document attempts to commit memory state to the database
- The commit fails for some reason
- The page restores - it still has state in memory, and believes it has persisted it to the database, but it is wrong. Ideally, it has some way to know it should persist the data.

Other requirements/scenarios/constraints?

spanicker commented 6 years ago

The other scenario developers would care about is: page is getting unloaded, an IDB commit is made, but the commit fails for some reason. The next time the page is loaded -- should this information be conveyed to the app somehow? \cc @philipwalton

philipwalton commented 6 years ago

Yes, these are both cases that developers should care about.

page is getting unloaded, an IDB commit is made, but the commit fails for some reason. The next time the page is loaded -- should this information be conveyed to the app somehow?

If the information is not conveyed then it'll require apps to perform sync logic on every page load. Even if commit failures are rare, the fact that they can happen means developers will have to account for them, and without any signals from the browser they'll have to manually check for inconsistencies.

The page restores - it still has state in memory, and believes it has persisted it to the database, but it is wrong. Ideally, it has some way to know it should persist the data.

This is a really tricky case. The more I think about it, the more I think we should try to run success/complete/error handlers after a resume (if possible). If we don't, I can see lots of issue popping up.

For example, consider some code that wraps an IDB transaction in a promise. If the promise code is persisted across a freeze but the IDB callbacks are dropped, that promise will never resolve or reject.

@spanicker do you know what currently happens to other async callbacks (timers, promises, etc) in the resume case?

spanicker commented 6 years ago

I think we can fire the callbacks after resume. @fmeawad - could you comment on current behavior?

fmeawad commented 6 years ago

The current behavior AFAIK is that the callbacks will be called after resume since they will be in the queue. To avoid confusing the dev when the callbacks are called after resume, I suggest that we remove them (which is against to what this issue is trying to achieve) I am more worried in case we discard the page after freeze, because then the callbacks will be lost and hence no way to know that it failed.

asutherland commented 6 years ago

If the information is not conveyed then it'll require apps to perform sync logic on every page load. Even if commit failures are rare, the fact that they can happen means developers will have to account for them, and without any signals from the browser they'll have to manually check for inconsistencies.

Would the background sync API be sufficient here? A page can check if there's an outstanding sync tag via the SyncManager API if it needs to, plus it can also just be event-driven on this front by just processing the "sync" event(s). As long as the IndexedDB commit's success is tied to resolving the sync event's waitUntil(), the page only needs to check if the commit succeeded if the sync event is still pending. (That is, the commit() could succeed but the "sync" event's transaction might not have committed through in the event of a crash.)

This is a really tricky case. The more I think about it, the more I think we should try to run success/complete/error handlers after a resume (if possible). If we don't, I can see lots of issue popping up.

Speaking for Firefox's bfcache strategy, in situations like this we would usually either forbid the window (and the entire window hierarchy in which it lives) from going into bfcache because of ongoing activity or remove the window from the bfcache when it becomes semantically doomed. For example, we will forbid putting a window in bfcache if it has webrtc stuff going on or used WebVR, see (nsIDocument::CanSavePresentation). And we remove windows from bfcache if they receive a message via BroadcastChannel or MessageChannel. Researching this led me to discover that we're buggy in the IDB case in exactly the way you fear here due to a longstanding regression, this is now Gecko Bug 1458342.

inexorabletash commented 5 years ago

TPAC 2019 Web Apps Indexed DB triage notes:

We think the ideas in https://github.com/wanderview/storage-corruption-reporting could apply here, if made more general. If a transaction fails for any reason (corruption, I/O error or logic error) then having that available for inspection/reporting by a site would be valuable. This would be done regardless of whether the transaction did manage to fire abort events to script directly. There are issues with how much and for how long the data is retained that need to be sorted out.

w3c / IndexedDB

How should commit() failures be surfaced when freezing / unloading #237