ueberdosis / hocuspocus

The CRDT Yjs WebSocket backend for conflict-free real-time collaboration in your app.
https://tiptap.dev/docs/hocuspocus/introduction
MIT License
1.19k stars 115 forks source link

Updates made using openDirectConnection are not synchronizing changes across different Redis instances. #800

Closed lin52025iq closed 4 months ago

lin52025iq commented 5 months ago

Description Launch two instances A and B using pm2, and use Redis for horizontal scaling. Connect from the browser to instance A. Open a connection on instance B using openDirectConnection to the same document doc. Modify doc within the connection, but there are no updates on the browser side.

Steps to reproduce the bug

  1. Create two instances using pm2

    {
    "apps": {
        ...
        "instances": 2,
        "exec_mode": "cluster"
    }
    }
  2. Use Redis for horizontal scaling

    const server = Server.configure({
    ...
    extensions: [
        new Redis({
            host: '127.0.0.1',
            port: 6379
        })
        ...
    ]
    })
  3. Modify the document using openDirectConnection

    ...
    instance.openDirectConnection(documentName, context).then(connect => {
    connect.transact(doc => {
        const texts = data.split('\n')
        const xmlFragment = doc.getXmlFragment('default')
        const list: XmlElement[] = []
    
        texts.forEach((text) => {
            const YXmlElement = new XmlElement('paragraph')
            list.push(YXmlElement)
    
            const YXmlText = new XmlText()
            YXmlElement.insert(0, [YXmlText])
            YXmlText.insert(0, text)
        })
        xmlFragment.push(list)
    }).finally(() => {
        if (connect.document) {
            connect.document.broadcastStateless(
                json_stringify({
                    status: 'success',
                    type: 'need-load'
                })
            )
        }
        connect.disconnect()
    })
    })
    ...
  4. Connect to the document doc on the browser side and get connected to instance A through Redis.

  5. Modify the document doc on instance B by executing openDirectConnection, which succeeds.

  6. After successful modification, the browser page does not synchronize the content modified on instance B. The complete content is displayed after refreshing the browser to reconnect.

  7. If openDirectConnection is executed on instance A to modify the document, the page synchronizes normally.

Expected behavior Even if the document is modified on instance B, the browser should also synchronize the modified content.

Other The messages sent by connect.document.broadcastStateless on instance B can be received on the browser.

janthurau commented 5 months ago

hey @lin52025iq, I've now spent some time trying to reproduce this, but am unable to. I opened firefox and chrome and connected each to a different hocuspocus server both connected using the same redis.

When doing broadcastStateless on server1, frontend2 properly receives it, just as the direct connection document updates done on server1 ; they are correctly routed through redis to the 2nd server, and then the second frontend.

Not sure if I'm missing something here, but if you still face this issue please provide a reproduction example that I can just run.

janthurau commented 5 months ago

I actually just now re-read the title - are you trying to synchronize "across different redis instances"? We don't support that, we only support multiple hocuspocus connected to a single redis (or any cluster setup that makes sure redis messages get routed to all hocuspocus servers)

lin52025iq commented 5 months ago

hey @janthurau, I created a project to reproduce this issue, it still exists and confuses me. https://github.com/lin52025iq/reproduction-800

Could there be a problem with the way I'm using it?

janthurau commented 5 months ago

ahh, thanks for providing this! This took me a while ...

The issue (and difference to my testing) is, that you are applying the direct update on a hocuspocus instance that doesnt have the document open (because no clients are connected). I think there is a timing issue happening, which is why it works sometimes. If the full sync cycle finishes before the instance with the direct connection unloaded the document, it works; if the sync takes more time than unloading the doc, not.

Not sure yet what's the best fix for this, as we're subscribing to updates only after a client connects - we'd need some way to wait for redis to sync (or even sync all documents, regardless of connection status).

lin52025iq commented 5 months ago

Yes, so now I judge whether there is only one client on the current hocuspocus instance (a client opened using openDirectConnection, without any other browser-side client connections) through the onChange in the extension-redis. I modify whether it's on the current hocuspocus instance (transactionOrigin is falsy, I'm not sure why it's not __hocuspocus__redis__origin__ here), and whether the socketId of this single client is empty. If these conditions are met, I actively publish modifications made to the document by openDirectConnection to other hocuspocus instances through redis.

image

This is exactly what https://github.com/ueberdosis/hocuspocus/pull/801, and it indeed solves the issue of not synchronizing in a timely manner at present. However, I always feel that this isn't a good solution and wonder whether there might be hidden issues.

janthurau commented 4 months ago

alright, makes sense! The difference is that we are always triggering a full sync (which requires the other server to answer), whereas your change would send only the update message (that the other server will just apply). This can work, but in general the issue is that documents stored on different instances won't fully sync unless you actually have users connected to both instances (redis also does not guarantee message delivery). ; I'm also not sure if the other server will be able to apply the update under all circumstances, I would expect that there are edge cases where previous updates might be required but missing.

Why are you actually using direct connection on a separate hocuspocus instance? If you are applying direct connection updates on instance A, and then send the update to instance B, it means the messages gets processed on both instances (i.e. you don't reduce load on instance B)

lin52025iq commented 4 months ago

however, to ensure stable service load, I utilized PM2's cluster mode to enable 8 hocuspocus instances. At this time, when connecting to hocuspocus from a browser, it's uncertain which hocuspocus instance the connection will be made to. Furthermore, there is a requirement to append content to documents on the hocuspocus instances, so an API was provided on the server to append content to the end of documents. In this case, it's also uncertain which hocuspocus instance the API request will reach. To achieve both of these requirements, being able to individually modify the document content on a hocuspocus instance and synchronize it with the other instances becomes inevitable.

janthurau commented 4 months ago

@lin52025iq the issue should be fixed with https://github.com/ueberdosis/hocuspocus/pull/819 ; I'm currently releasing this.

Let me know if you still face the issue afterwards :)

lin52025iq commented 3 months ago

Thanks, this's useful to me.