ueberdosis / hocuspocus

The CRDT Yjs WebSocket backend for conflict-free real-time collaboration in your app.
https://tiptap.dev/docs/hocuspocus/introduction
MIT License
1.34k stars 129 forks source link

DirectConnection causing document state corruption #832

Closed raimohanska closed 5 months ago

raimohanska commented 5 months ago

Description

If you use a DirectConnection when there are no WebSocket connections for a document, the WebSocket connections established during the next 2 seconds (or debounce time) will end up using an orphaned document and will not sync changes properly with clients that establish connections after that 2 second period. The problem lies in that storeDocumentHooks is called directly after closing the DirectConnection, but also in a debounced manner a bit later. Details follow.

Steps to reproduce the bug

  1. Start from a situation where there are no connections for document doc1.
  2. Establish a direct connection on the server, transact something and disconnect the connection.
  3. At this stage storeDocumentHooks has now beed called twice (once for the (transact and once for the disconnect). As the result of the latter, the unloadDocument method has been called and the doc1 has beed removed from this.documents
  4. All fine, except that there's still a debounced call for storeDocumentHooks to be performed in 2 seconds or so.
  5. Let's say a WebSocket client connects at this stage. Because the doc1 is not loaded yet, it will now be loaded
  6. After 2 seconds, the debounced storeDocumentHooks call kicks in. This call is bound to the earlier Y.Doc instance that no longer is referenced in this.documents (there's no a new copy of doc1 loaded). This call results into another call to unloadDocument which, on this line removed the document from this.documents. Unfortunately, it ends up removing the new document instance, which is now referenced by the WebSocket client which connected in step 5.
  7. Now the WebSocket client is connected to an orphaned document not in this.documents. Any client connecting from now on, will be assigned to a new document instance and the clients will not see changes from each other.

Expected behavior

When the DirectConnection is closed and there are no more clients for doc1, we should not unload the document when there is still a debounced storeDocumentHooks call left to be performed. Or alternatively, at least make sure that the debounced call does not end up removing the newer, active document from this.documents.

In my project, I made a quick patch for this line to additionally check that the document currently in this.docs is actually same document that's now being unloaded.

The patch solution is suboptimal though at least in the sense that two copies of the same document will exist in server memory for a time. I'd prolly rather have it so that the document accessed by DirectConnection would be subject to the same lifecycle as with WebSocket connections - that is, to be unloaded only after the normal debounce period. Yet, I'm not intimately familiar with the server design, so my hunch may be misguided.

janthurau commented 5 months ago

Thanks a lot for this @raimohanska!

I think I've found a way (I'm just using the debouncer in storeDocumentHooks), but will need to do some testing. I haven't reproduced your issue yet, but it makes sense. I'm spending some time now testing / reproducing it.

janthurau commented 5 months ago

aha! This only happens when you start a transaction and pass an origin, otherwise it works (L404 in Hocuspocus.ts is the guard).

I've pushed a fix in #834 and added a reproduction test case. Can you verify if that works for you? 🙏

raimohanska commented 5 months ago

Great, thanks! I'll have a look tomorrow and see if I can verify the fix.

Btw the redis-specific check on L404 looks quite hacky 😂