jupyterlab / jupyter-collaboration

A Jupyter Server Extension Providing Support for Y Documents
https://jupyterlab-realtime-collaboration.readthedocs.io/en/latest/
Other
157 stars 30 forks source link

Simplify `YDocWebSocketHandler` #247

Open dlqqq opened 6 months ago

dlqqq commented 6 months ago

Brief description

The YDocWebSocketHandler can be dramatically simplified and reduced in complexity. We may also be able to remove the dependency on pycrdt-websocket.

Context: The Yjs protocol

Here I present a brief overview of the Yjs protocol for context.

Within the Yjs protocol, there are two protocol layers:

Since jupyter_collaboration isn't a document peer itself, it doesn't need to use the awareness protocol layer, we just need to broadcast awareness messages to all peers when we receive one.

The sync protocol, in turn, has 3 message types:

Suppose we have the simplest Yjs network with a single node named node A. Suppose node B connects to this network. Here is how the connection should be established according to the sync protocol layer:

After the Yjs connection is established through this process, Node A and Node B then stream SU messages to one another to synchronize document state.

Motivation and impact

The minimum requirements of YDocWebSocketHandler are fairly concise:

However, the current handler implementation has some issues:

Unifying the source code in a single repo makes releasing this software much easier. This, along with reducing the complexity of the source, will allow for others to contribute to this project more easily. :hugging_face:

Proposal

I will open a draft PR that proposes a simplified implementation. I will focus on the following areas:

davidbrochart commented 6 months ago

Since jupyter_collaboration isn't a document peer itself, it doesn't need to use the awareness protocol layer, we just need to broadcast awareness messages to all peers when we receive one.

I disagree, the backend is just another CRDT peer. But it's also an authority, and it should not blindly broadcast awareness messages. Imagine a client that changes their name in the awareness, for instance a student using the teacher's name, this should not be allowed. The backend has the authority to not accept these awareness messages if the awareness name doesn't match the authenticated user. There is some logic that I plan to activate in the future.

On receiving SS2 or SU messages:

  • Apply it to the Ydoc, then broadcast to all peers except the sender

Wrong, updates must be echoed to the sender as well.

On receiving awareness messages:

  • Inspect the awareness message to know which clients have joined / left the network (current implementation).
  • Broadcast to all peers except the sender.

Same comment.

Unifying the source code in a single repo makes releasing this software much easier.

pycrdt-websocket is and should remain Jupyter-agnostic. It is the Python equivalent to y-websocket. I've not encountered any issue related to releasing.

I will open a draft PR that proposes a simplified implementation. I will focus on the following areas:

  • Remove the dependency on pycrdt-websocket, merging existing logic into the YDocWebSocketHandler directly

I suggest not spending time on this, as I will likely not accept this change.