jupyter-server / jupyter_ydoc

Jupyter document structures for collaborative editing using Yjs/pycrdt
https://jupyter-ydoc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

Add model version #138

Closed fcollonval closed 1 year ago

fcollonval commented 1 year ago

Problem

If we ever want to change or extend the document models, they will become incompatible when collaborating with others (independent of which nbformat they are using). We should attach some sort of version to document models and disallow editing of documents that use other models.

From https://github.com/jupyterlab/jupyterlab/issues/2475

Proposed Solution

Add version class attribute to the document.

davidbrochart commented 1 year ago

I guess we could use SemVer? I can imagine changes that would be compatible (e.g. more data in a YMap) and others that would be incompatible (e.g. a structural YDoc change).

davidbrochart commented 1 year ago

Add version class attribute to the document.

Actually I'm thinking that it should not be a class attribute but a property of the YDoc. We could include it in the ystate for instance. The reason is that different peers will need to check the version, e.g. a YNotebook in the back-end and one in the front-end. A version mismatch should never happen now that all models live in the same jupyter_ydoc repo, but I can imagine different peers installed in different environments/machines.

fcollonval commented 1 year ago

The trouble I see when using the state for example is that the property is not immutable. Maybe the check should be done when the client is opening the websocket. It will be like: please give me a notebook document v2.0.1 - sorry I cannot; notebook document model I know is v1.0.0.

davidbrochart commented 1 year ago

Right, the state is not immutable but peers can react to its changes. The client will have an initial version of the document (the one it supports), and when syncing with the back-end it will detect a potential version mismatch and maybe refuse to open the document. It's true that it will probably change the version, but then the back-end can react to that change and always change it back to the initial value. I think it raises the question of deciding who has authority over the document version. It's tempting to say that it's the one who first opened the document, but maybe the version could be updated during the life cycle of the document to adapt to a consensus over all connected peers?

fcollonval commented 1 year ago

I think it raises the question of deciding who has authority over the document version.

I would say that the server is the authority - but that hypothesis will break if it supports a range of versions.

davidbrochart commented 1 year ago

Even talking about a server is limiting the scope of jupyter_ydoc, which is transport-agnostic. Our current architecture in Jupyter is a centralized one, but what will happen if/when we switch to a distributed architecture? There will be no authority in that case, just a bunch of peers synchronizing their state.

fcollonval commented 1 year ago

Even talking about a server is limiting the scope of jupyter_ydoc, which is transport-agnostic. Our current architecture in Jupyter is a centralized one, but what will happen if/when we switch to a distributed architecture? There will be no authority in that case, just a bunch of peers synchronizing their state.

Sure but the authority is out of scope of this issue. Here the point is to provide a reliable model version that could be used by consumer to validate compatibility. How to validate, when to validate and what happen if it fails is not the responsibility of this package as you said. But in any case, there is gonna be a communication channel opening action that should check the validity of the action (similar to what you did for the websocket protocol). So we must have a immutable model version.

davidbrochart commented 1 year ago

OK, it's true that the document has an origin when e.g. it is loaded from disk. The case of simultaneous document creations can be dealt with later, if it ever happens (or we can simply forbid it). Interesting that you're mentioning the WebSocket subprotocols. Should we use them for the YDocWebSocketHandler? That would make it possible for the server and the client to agree on a model version that they can both handle.

fcollonval commented 1 year ago

Should we use them for the YDocWebSocketHandler? That would make it possible for the server and the client to agree on a model version that they can both handle.

You are bringing me in unknown territory :wink: I don't know how flexible they are. One of the challenge for the package managing the communication is the ability to load dynamically new type of models (and versions?). I guess this would imply a dynamic list of subprotocols. Does it makes senses? Is it possible?

davidbrochart commented 1 year ago

If I recall correctly, with subprotocols the client opens the WebSocket with a list of protocols it supports. Then the server replies with the chosen protocol. In our case, the client would provide the list of supported versions, and the server would choose the highest version in this list that it also supports. It's a bit like package resolution, there are better chances to be compatible if a range of versions is supplied instead of just one version. But yes that means that multiple document versions must be handled. Maybe we could have a convention on the document name, like appending the version, e.g. YNotebook_v0.1.0?