mavoweb / mavo

Create web applications entirely by writing HTML and CSS!
https://mavo.io
MIT License
2.84k stars 176 forks source link

Data loss in firebase plugin realtime option #725

Open karger opened 3 years ago

karger commented 3 years ago

I've created a small example demonstrating how data loss can occur with the realtime option in the firebase plugin. The problem is that a non-focused tab seems not to receive push update notifications from firebase, so it misses out on updates that occur while it's out of focus. Which means it still has the old data. If it's then brought into focus and data is edited, a write happens with the old data, overwriting new data.

I'm not quite sure how to address this. If it's possible to detect a tab coming into focus, then mavo could query firebase to make sure it loads the most current data before the user has the opportunity to perform an edit that triggers a save. If that's not possible, I can imagine some hacky ways to try to detect it. Use a tight setInterval loop to repeatedly measure the value of the clock. Notice if there is a large gap with the previously measured value; that would be a sign that the process went to sleep and the data needs to be reloaded.

https://codepen.io/karger/pen/qBrYYeR?editors=1100

joyously commented 3 years ago

trying to reinvent React?

DmitrySharabin commented 3 years ago

@karger Actually, it's not an issue with Mavo, but with the Firebase Firestore plugin. Precisely, with its offline persistence feature which is enabled by default. Once, we discussed this feature and came to the conclusion that if this feature is provided by the library, we can benefit from it. That's why we decided to enable it. As it turns out, sometimes it can cause some problems.

For now, there is no way to disable it by passing some options to the plugin. But I can implement it shortly. However, we need to decide, whether we want offline persistence to be enabled by default and provide some way to disabled it. Or we should disable it by default and provide a way to enable it. Or we can leave it enabled by default and if an author wants real-time updates we could warn them that offline persistence might cause some issues in that case and we recommend disable it via some parameter. What do you think?

How would we name such parameter: offline-persistence/no-offline-persistence, or simply offline/no-offline? Somehow else?

With offline persistence disabled everything works as expected (see the screencast below): https://user-images.githubusercontent.com/9166277/120976941-7a330180-c77b-11eb-9de8-aeb26a13584f.mov

LeaVerou commented 3 years ago

The problem is that once the user has made edits, the new data needs to be merged with the old data. Maybe eventually we need to bite the bullet and implement client-side JSON diffing. I wonder how large the libraries for that are. If very large, we can maybe load it conditionally when needed (saving data or reloading data)

Use a tight setInterval loop to repeatedly measure the value of the clock. Notice if there is a large gap with the previously measured value; that would be a sign that the process went to sleep and the data needs to be reloaded.

Note that setInterval() runs in idle tabs, it's requestAnimationFrame() that doesn't.

karger commented 3 years ago

An alternative to "new data merged with old data" is "reject the new edits until data is reload". I don't think there will be many cases where someone edits and then suspends the tab before save. Rather, the common case where someone restores a suspended tab and proceeds to edit. If a tab rapidly detects that it has been asleep, then it can quickly reload the (current) data and at worst lose a tiny amount of new edits.

An alternative to reloading the data is to "disconnect". Some platforms (overlead is one) will go into "disconnected" state if the go too long without hearing from the server, and will refuse to accept edits until they reconnect. We could do something similar. Again, rapidly disconnecting would prevent any large amount of new edits being lost.

I don't see much cost to reloading the data though, so I would advocate that over disconnecting.

karger commented 3 years ago

Dmitry, you assert it is a problem with the firebase plugin, but note that any plugin providing a "realtime" option is likely susceptible to the same problem---which may be a good reason to solve it in mavo instead of in each plugin. Note that realtime can be simulated on nearly any platform via polling, if we want.

LeaVerou commented 3 years ago

I don't think there will be many cases where someone edits and then suspends the tab before save.

I think that's super common in any Mavo app that doesn't autosave!

karger commented 3 years ago

I agree that offline persistence is sometime desirable and sometimes not. It seems to me that realtime mode should always disable offline persistence.  Because realtime mode is saying you always want to be in sync with the server, and if you go offline you can't be.  Can you think of a case?   On the other hand, if realtime is not enabled, then I am basically saying I might save at any time without bothering to load first, which would overwrite any data changes.   Which is saying I don't care much about the state of the server, which would suggest you might as well enable offline persistence for convenience.   A counter argument is that someone might just plan to reload frequently while online to avoid overwriting data, and that they can't do that while offline, so that is a case where they might not want offline persistence. But this seems week, because they should not edit while offline in this case.

In summary, one approach would be to disable offline persistence in realtime mode only.   What do you think?

On 6/7/2021 3:33 AM, Dmitry Sharabin wrote:

@karger https://github.com/karger Actually, it's not an issue with Mavo, but with the Firebase Firestore plugin. Precisely, with its /offline persistence/ feature which is enabled by default. Once, we discussed this feature and came to the conclusion that if this feature is provided by the library, we can benefit from it. That's why we decided to enable it. As it turns out, sometimes it can cause some problems.

For now, there is no way to disable it by passing some options to the plugin. But I can implement it shortly. However, we need to decide, whether we want offline persistence to be enabled by default and provide some way to disabled it. Or we should disable it by default and provide a way to enable it. Or we can leave it enabled by default and if an author wants real-time updates we could warn them that offline persistence might cause some issues in that case and we recommend disable it via some parameter. What do you think?

How would we name such parameter: |offline-persistence|/|no-offline-persistence|, or simply |offline|/|no-offline|? Somehow else?

With offline persistence disabled everything works as expected (see the screencast below): https://user-images.githubusercontent.com/9166277/120976941-7a330180-c77b-11eb-9de8-aeb26a13584f.mov https://user-images.githubusercontent.com/9166277/120976941-7a330180-c77b-11eb-9de8-aeb26a13584f.mov

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mavoweb/mavo/issues/725#issuecomment-855674184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIWSXQIW2IDTL5VWDJE6CLTRRY4DANCNFSM46FO36RA.

LeaVerou commented 3 years ago

I can certainly think of cases where you want realtime when connected and offline when not connected, e.g. Google Docs.

karger commented 3 years ago

I'm not sure what google docs does if you've been offline editing for a while and come back and the server doc has been changed. Does it overwrite? Create multiple versions? Merge somehow?

karger commented 3 years ago

This problem also suggests to me that either the firebase library or the browser is violating its contract. For realtime we are using the firebase Snapshot class, which is supposed to receive events whenever there's a change on the server. Presumably it uses web sockets or some such. The tab obviously can't receive those events while suspended, but shouldn't they be queued in the browser for delivery when the tab unsuspends? If they aren't, that seems like a browser problem. If they are but firebase is ignoring, that seems like a a firebase problem. Either way, wouldn't it be obvious for the firebase library to reconnect to the server and check for changes when the tab unsuspends? Why do we have to implement that?

DmitrySharabin commented 3 years ago

I looked through the Cloud Firestore documentation one more time and watched their tutorials, and here are some excerpts and my thoughts:

So, the more I think about it, the more I lean towards letting an author decide what features they want to enable and what not; we provide authors all the features the Firebase library supports but expect them to be precise in enabling features their app needs, so they are not being bitten by our defaults.

I suggest making offline persistence optional and provide a new keyword offline-persistence to enable it via mv-storage-options (or via deprecated mv-firebase), like so: <el mv-storage-options="offline-persistence"> or <el mv-storage-options="auth storage offline-persistence">

karger commented 3 years ago

I agree there's value in letting the author decide. As an example, with the two apps I've made that interoperate, one of them should really enable offline persistence, because each storage has only a single writer. But the other one should disable offline edits, because there are multiple writers to the storage.

I think the default needs to be to disable offline persistence though. That way, the author will notice that they don't have it and can go lookup how to provide it if the want. If we default the other way, the author may not know there's a problem until they lose data. Also, many web applications are without offline persistence, so people won't see it as a critical feature (unless they do, and decide to turn it on).

karger commented 3 years ago

I don't recall the discussion for moving to mv-storage-options. the new syntax seems less general in that it doesn't suggest a consistent way to provide options to plugins that aren't storage plugins?

DmitrySharabin commented 3 years ago

I don't recall the discussion for moving to mv-storage-options. the new syntax seems less general in that it doesn't suggest a consistent way to provide options to plugins that aren't storage plugins?

Oh, it's just an adoption of the fairly new Mavo feature—storage attributes, not the new syntax specific to the Firebase plugin. These attributes are already adopted by the Github, GSheets, and Firebase plugins.

Feel free to use mv-firebase if you like. It's still supported but not documented anymore.

Here is a list of corresponding issues: #657 #664 #665 #684

LeaVerou commented 3 years ago

Feel free to use mv-firebase if you like. It's still supported but not documented anymore.

I think this should be explicitly deprecated and print out a console warning.