jupyter-server / jupyter_ydoc

Jupyter document structures for collaborative editing using Yjs/pycrdt
https://jupyter-ydoc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

Add type/version information to Y documents #123

Closed davidbrochart closed 1 year ago

davidbrochart commented 1 year ago

Problem

Currently, a YFile represents a plain text document. This seems to assume that a document with no particular structure should be of text type, while it could also be binary. Even a YNotebook seems to assume that a notebook will always have the JSON type that we know, but it might not be the case anymore in the future (Jupyter notebooks could evolve towards a Markdown-based format). These two Y documents were derived from jupyter-server, where we have a file model and a notebook model. The file model can be of type text or binary, but in the latter case it is base64-encoded. The notebook model has a JSON type.

Proposed Solution

I think we should not stick to what jupyter-server is doing, and have:

Or include the type/version in the Y document instantiation. For instance, keep YFile and YNotebook but instantiate with:

Thoughts?

dmonad commented 1 year ago

It makes sense to create different models for different use-cases. However, I'd say that it's not about the the kind of content you store. But about the semantics that you want to enforce.

YFile is great if you want to work on textual data or want to allow clients to concurrently manipulate bytes or text.

A YBinaryFile would make sense if you want to enforce that writes always overwrite each other (e.g. when working with images). It doesn't make sense to diff&merge concurrent overwrites on an image. This would be implemented using a Y.Map: ymap.set('content', binaryData).

YMdNotebook might be superfluous, as the underlying model of Y.File can already handle Markdown (text is just markdown). You can also bind a Y.File (the underlying Y.Text) to a rich-text editor like Quill for rich-text features. Text-only clients (e.g. codemirror users) would only see the text without rich-text annotations (like bold, italic, ..).

davidbrochart commented 1 year ago

Thanks for the feedback @dmonad.

YFile is great if you want to work on textual data or want to allow clients to concurrently manipulate bytes or text.

But actually YFile only works for UTF8-encoded text, not for bytes. So in practice, models are also about data type. Wouldn't it be better to rename it YUTF8?

A YBinaryFile would make sense if you want to enforce that writes always overwrite each other (e.g. when working with images). It doesn't make sense to diff&merge concurrent overwrites on an image. This would be implemented using a Y.Map: ymap.set('content', binaryData).

The name YBinaryFile seems to indicate that it stores binary data, but in reality it can store anything, right? Here I think that we should enforce semantics, and maybe call it YBlob?

davidbrochart commented 1 year ago

122 added YBlob and renamed YFile to YUnicode.