y-crdt / ypy-websocket

WebSocket Connector for Ypy
https://davidbrochart.github.io/ypy-websocket
MIT License
42 stars 21 forks source link

Understanding Django Channels setup #88

Open mgug01 opened 11 months ago

mgug01 commented 11 months ago

Hey there-

I really appreciate the hard work that has gone into this repo thus far.

I'm trying to refactor my ypy-websocket setup with the new Django Channels consumer within my ReactJS + Django application but am having trouble getting things running.

Aside from the initial "connect" I'm unsure if my implementation is correct, how to save an load documents, and it for some reason I am seeing quite a few "disconnect" errors:

WebSocket CONNECT / [127.0.0.1:60208]
WebSocket DISCONNECT / [127.0.0.1:60055]
WebSocket HANDSHAKING / [127.0.0.1:60244]
WebSocket CONNECT / [127.0.0.1:60244]
WebSocket DISCONNECT / [127.0.0.1:60084]
WebSocket HANDSHAKING / [127.0.0.1:60287]
WebSocket CONNECT / [127.0.0.1:60287]

Here's what I have so far from the Django side:

class NoteConsumer(YjsConsumer):

    def make_room_name(self) -> str:
        return self.note_id

    async def make_ydoc(self) -> Y.YDoc:
        doc = Y.YDoc()
        self.document_obj = await get_note(self.note_id)
        if self.document_obj:
            # HOW DO I LOAD THE DOCUMENT IF IT IS SAVED AS JSON IN document_obj.notes_data?
            return doc
        else:
            await self.close()

    async def connect(self) -> None:
        query_string = self.scope['query_string'].decode()
        query_string = parse_qs(query_string)
        params = {key: value[0] if len(
            value) == 1 else value for key, value in query_string.items()}
        user_token = params.get('user_token', None)

        auth_passed = await self.auth_user(user_token)
        if not auth_passed:
            await self.close()
            return

        note_id = params.get('note_id', None)
        self.note_id = note_id.replace('/', '')
        self.room_name = self.make_room_name()
        self.ydoc = await self.make_ydoc()
        await super().connect()

    async def receive(self, text_data=None, bytes_data=None):
        if text_data:
            text = json.loads(text_data)
            request_type = text['type']
            if "document_save" in request_type:
                # IS THIS THE CORRECT WAY TO SAVE THE DOCUMENT TO THE DATABASE?
                doc_text = text.get('document', None)
                doc_data = text.get('bytesDoc', None)
                await update_note(self.document_obj, doc_text, doc_data)
        if bytes_data:
            await self.group_send_message(bytes_data)

    async def auth_user(self, auth_token):
        test_auth = get_user(auth_token)
        if test_auth:
            return True
        else:
            return False

For further context on the data we want to retrieve/save from/to the database, we use Y.encodeStateAsUpdate(doc) on the JS client side, and save as JSON to our Postgres database.

Thanks for your time!!

zswaff commented 11 months ago

Hey @mgug01, thanks for the code. I have a few quick thoughts

  1. If you fully omit the make_ydoc func, does it work better?
  2. Overriding receive (especially without calling await super().receive(text_data, bytes_data) or whatever) may be the cause of a lot of your problems. If you remove this, do you establish connections properly?
  3. The approach I would recommend would be to get the connections and live updates working first, then worry about save and load. To do this you might remove a lot of the code you added, then open two windows next to each other, and just see if a plain version works to sync data between the two tabs. Then you can worry about hooking up the load and save.
  4. Rather than overriding receive, the approach I documented for saving updates involves hooking into the YDoc's updates with doc.observe_after_transaction(self.on_update_event). You might have more success with this
  5. For loading and initializing the doc--this is a bit tougher. What is the format on the frontend? Like what is creating the YDoc? If you have JSON and you need to convert that to YJS format, you need to flesh out your implementation of make_ydoc by using the YJS python lib to make the YDoc in the format that the frontend expects. This is honestly a challenge, and it's something I'm working on on my end right now too. Some other folks just added some new features to Ypy to make this more possible. What is the format that you're going for?
mgug01 commented 11 months ago

Hi there @zswaff!

Thanks so much- really appreciate the reply on this!

  • If you fully omit the make_ydoc func, does it work better?

Tried this and unfortunately the document will still occasionally "reconnect" without reason after every 30 sec-1 min

  • Overriding receive (especially without calling await super().receive(text_data, bytes_data) or whatever) may be the cause of a lot of your problems. If you remove this, do you establish connections properly?
  • The approach I would recommend would be to get the connections and live updates working first, then worry about save and load. To do this you might remove a lot of the code you added, then open two windows next to each other, and just see if a plain version works to sync data between the two tabs. Then you can worry about hooking up the load and save.
  • Rather than overriding receive, the approach I documented for saving updates involves hooking into the YDoc's updates with doc.observe_after_transaction(self.on_update_event). You might have more success with this

This is an interesting idea- I didn't quite understand what self.on_update_event was referring to, but this makes a lot more sense. I'd like to use something like observe_after_transaction though it doesn't seem like that method would allow for receiving specific message request types from the client side.

Though I suppose we wouldn't need to receive these message types if we could load database data within make_ydoc and save the document within some sort of callback.

5. For loading and initializing the doc--this is a bit tougher. What is the format on the frontend? Like what is creating the YDoc? If you have JSON and you need to convert that to YJS format, you need to flesh out your implementation of make_ydoc by using the YJS python lib to make the YDoc in the format that the frontend expects. This is honestly a challenge, and it's something I'm working on on my end right now too. Some other folks just added some new features to Ypy to make this more possible. What is the format that you're going for?

The client side is using ReactJS, Quill, y-websocket and y-quill. To save the document, I am calling Y.encodeStateAsUpdate(doc) converted to text/JSON using fromUint8Array()- then saving that JSON to my Postgres database.

To load the document, I do pretty much the same but reversed. My client-side sends a request_type of load, the JSON of the document is received, I convert using toUint8Array() and run Y.applyUpdate(clientsideYDoc, convertedJSONData).

Does this make sense? Thanks again for the time.

zswaff commented 11 months ago

Cool, happy to help!


I want to come back to this

The approach I would recommend would be to get the connections and live updates working first, then worry about save and load. To do this you might remove a lot of the code you added, then open two windows next to each other, and just see if a plain version works to sync data between the two tabs. Then you can worry about hooking up the load and save.

Did you try it? Like if you literally take out all of your overriding code, does it work to sync data between tabs?


the document will still occasionally "reconnect" without reason after every 30 sec-1 min

This sounds like it might be a general websocket timeout. I think websockets need a keepalive every min or so. Usually frontends are responsible I believe. Next steps here really depend on the first test above. If the two tabs are working, then it seems like perhaps the changes to receive are the issue. If not, perhaps the frontend you are using is not correctly sending the keepalive. You can also investigate some of this through the devtools, and see what messages are sent back/forth through a given WS connection.


I didn't quite understand what self.on_update_event was referring to

Makes sense, maybe one or the other of us can improve the documentation for this when we get the chance.


it doesn't seem like that method would allow for receiving specific message request types from the client side. Though I suppose we wouldn't need to receive these message types if we could load database data within make_ydoc and save the document within some sort of callback.

I think the latter is the intended approach if possible!


To save the document, I am calling Y.encodeStateAsUpdate(doc) converted to text/JSON using fromUint8Array()

OK this is simpler/easier to handle than I expected I think. A lot of this will change when you are using this lib but the basic principles would be the same. I guess you would

  1. Get rid of almost the entire process you described because it is handling all of the logic on the frontend and I think your goal is to move it to the backend?
  2. To save, use Ypy to encode_state_as_update on the backend during observe_after_transaction and save this to the DB
  3. To load, get the update from the DB and use Ypy to apply_update on the backend during make_doc
cacosandon commented 6 months ago

Hey @zswaff!

I know it’s an old issue, but may be you can help me!

I am using this recommendation:

To save, use Ypy to encode_state_as_update on the backend during observe_after_transaction and save this to the DB

But because each client is associated with a specific consumer, every YDoc is different (only has updates from the connected client), so the document is not really the same as the Yjs Room document.

Do I missing something? Is my setup wrong?

Also, do you use this in production? how do you debounce the saves to the database? I can see this Django Channels implementation doesn’t use YStore or YRoom from this library.

Thank you very much!

zswaff commented 6 months ago

Hey @cacosandon, happy to help. This is actually timely because I'm about to kick off a project on our team to improve some of this stuff internally, which may involve some changes here if we find my old work lacking haha. We're not using it in production yet but we plan to be once we make these improvements.

You're absolutely right that each client has its own YDoc with this setup--you're not missing anything. We will probably need to configure everything more like the way you described before it's ready.

Let me know if there's other stuff I can help with or if you want to collaborate on any of these improvements!

cacosandon commented 6 months ago

Amazing @zswaff! I already edited some of the code so all consumers have the same YDoc.

I have replaced the receive method from YjsConsumer with this:

+    async def process_sync_update_message(self, message_wrapper: dict) -> None:
+        logger.info("Receiving SYNC UPDATE from %s", self.ydoc.client_id)
+
+        await process_sync_message(
+            message_wrapper["update"],
+            self.ydoc,
+            self._websocket_shim,
+            logger,
+        )

+        # We save the document with debounce strategy
+        if time.time() - self.last_change_timestamp > self.seconds_between_saves:
+            logger.info("Saving document from %s", self.ydoc.client_id)

+            await YDocUpdate.objects.asave_snapshot(
+                self.room_name,
+                y_py.encode_state_as_update(self.ydoc),
+            )

+        self.last_change_timestamp = time.time()

    async def receive(self, text_data=None, bytes_data=None):  # noqa: ARG002
        if bytes_data is None:
            return

        await self.group_send_message(bytes_data)

        if bytes_data[0] != YMessageType.SYNC:
            return

+        if bytes_data[1] == YSyncMessageType.SYNC_UPDATE:
+            logger.info("Sending SYNC UPDATE from %s", self.ydoc.client_id)

+            await self.channel_layer.group_send(
+                self.room_name,
+                {"type": "process_sync_update_message", "update": bytes_data[1:]},
+            )

+            return

        await process_sync_message(
            bytes_data[1:],
            self.ydoc,
            self._websocket_shim,
            logger,
        )

This way, every "update" message (not sync protocol, not awareness) is sent to every consumer connected, and then process that message via process_sync_update_message

So every client has their own debounced save to the database with the most recent YDoc, and also all have the same doc.

This works well, at least locally. Don't know if we need all the complexity of YRoom and YStore, because most logic from them is included in Django Channels.

Haven't tested this in production yet. Do you see any flaws? Maybe when receiving offline updates or updates coming with different clock that could make the doc to break.

Would love to talk! We are using Slack in our company, just in case you want to create a shared channel, or maybe a Discord channel :)

zswaff commented 6 months ago

@cacosandon I'll take a look at that code as soon as I can. I agree with you in principle that this should be pretty straightforward.

If you want to join my company's discord we could start there, or we can chat on linkedin and set up a shared slack channel!