stefanw / channels-yroom

Django Channels WebSocket consumer and worker for synchronizing Yjs clients
https://channels-yroom.readthedocs.io/en/latest/
MIT License
15 stars 5 forks source link

Improve persistence with incremental updates #1

Open stefanw opened 1 year ago

stefanw commented 1 year ago

Currently, YDocs are in-memory and persisted only when the room is empty or the worker shuts down.

Here are some other Yjs persistence implementations:

A common theme is writing incremental updates and consolidating later.

Maybe creating an implementation for yrs-persistence in yroom might be the way to go.

linspw commented 1 year ago

Hi @stefanw Thank you very much for your work!

It's amazing, I spent hours creating something that doesn't come close to what you did (I tried using a different personal management implementation than YJS and tried using ypy-websocket with django, but it didn't do very well).

I tested it here and it worked perfectly.

I just have a question, about persisting data in a django model, and what time would be the best to consolidate these values, you know?

Many thanks in advance again!

stefanw commented 1 year ago

Hey @linspw, glad it's working for you!

Right persistence always stores a full update (no diffs, no need for consolidation). This seems OK as it is only done when a room is evicted (no clients for 30 seconds) or the yroom worker is shutdown (via SIGINT, SIGTERM or SIGHUP), so data is persisted under normal conditions.

My current use case has quite soft requirements, but more frequent or immediate saving is necessary, if you are worried about losing ydocs (e.g. due to OOM kills, app crashes or power loss). Saving partial updates probably makes sense if you need to save immediately or very frequently or when your document is very large.

So I'm looking into adding the storing of diffed updates and restoring from an update stack as an option. For that, an additional persistence backend like Redis might make sense. I'm currently wondering if I should build this on top of yrs-persistence and if this can be integrated into the Python/Django API.

BTW: There's a way to export ydoc content through a – not yet documented – API: https://github.com/stefanw/channels-yroom/blob/main/example/textcollab/views.py#L23-L24

Please let me know about your use case!

linspw commented 1 year ago

Thank you for your attention @stefanw

I'm understanding better how YRoom works.

But I realized that my test had worked because YWebsocket has a way to deal with the browser's inter tab data :(

Now when trying to use the server's websocket, it didn't work :(

Describing my use case:

I have a case where I need a document editor, similar to how google docs works.

stefanw commented 1 year ago

Sorry to hear that. Is the docker example from the Readme working for you?

Make sure:

stefanw commented 1 year ago

Are you using tiptap and their hocuspocus collaboration provider by any chance? Their wire format has a prefix which breaks communication. I‘m currently implementing support for that.

stefanw commented 1 year ago

Underlying library yroom now supports tiptap/hocuspocus style syncing. As to the original issue, yjs creator recommends not storing every single update, as that is a lot of tiny db writes:

https://discuss.yjs.dev/t/persisting-to-db-could-it-be-this-easy/358/4

This might work better for a backend like Redis than Postgres.

linspw commented 1 year ago

I made some tweaks in the editor and it worked perfectly.

I am now using TipTap.

Thank you for your attention.

I will open some MRs with some features/suggestions, for you to analyze and evaluate if they bring improvements

linspw commented 1 year ago

About how to save the document, I will put the one you presented in the utils file.

And about persistence, this is an interesting subject, as many databases have been done in Postgres these days, how can we deal with this record frequency?

Perhaps a queue in Redis, which consolidates and updates the database only.

linspw commented 1 year ago

Do you know a simple way to initialize the room with a text value already?

linspw commented 1 year ago

I think I understand:

I can override a storage property (with Django settings) of the YRoomChannelConsumer and thus use the "snapshot" which in my case will come from a django model and the save snapshot may need to implement something with redis