Running update concurrently

fcollonval commented 2 years ago

The publication of update is done task at a time:

https://github.com/y-crdt/ypy-websocket/blob/357d2a6bf2e221100f09fe01c3c6fc1e9534d472/ypy_websocket/websocket_server.py#L113-L123

We should do those update concurrently in a asyncio.gather to get them as quickly as possible and not be stuck by a slow client.

Moreover we should update with higher priority the document because it is the reference to be uploaded by any new clients before receiving deltas.

I think that part of the code is partly responsible for a data lost case seen when 20 people were collaborating simultaneously. What happen is some clients (cannot know if all) were still receiving the deltas. But the file on disk (that is a regular dump of the in-memory ydoc) stops updated. And if a new client connects, the document at the latest dump version was the one pushed.

My best guess (unfortunately we did not see any error) is that one client was blocking and so the code after the client loop was never executed.

cc: @davidbrochart @hbcarlos

davidbrochart commented 2 years ago

These are very good points @fcollonval, thanks for pointing that out. I'll open a PR for that. It also goes in the same direction as #38. cc @ellisonbg

SylvainCorlay commented 2 years ago

I would go even further than what @fcollonval is describing. I don't think that we even need to gather at all between updates to the peers and saving to the backend. We could do:

async for message in websocket:
    await self._process_queue.put(message)
    await self._broadcast_queue.put(message)

and then have two completely independent asyncio tasks,

One processing _process_queue.

message = await self._process_queue.get()
await process_message(message, room.ydoc, websocket)

One forwarding updates to the peers.

In doing so, we will still prevent one of the two tasks from stopping if the other one is (momentarily) choking on updates and having a longer queue.

Another thing we could experiment with is to synchronously use put_nowait instead of put. If queue.Full is raised for one of the two Queues, this means that it is not beeing emptied quickly enough and the server is choking on Y updates. This may be a good way to detect a performance issue in either of those tasks.

@afshin

ellisonbg commented 2 years ago

Great discussion! I like the idea of @SylvainCorlay of having two async queues for processing and broadcasting the messages. Another concern that I have is the presence of blocking calls in the server or server extensions. Those would quickly slow everything down. We should probably spend some time profiling the server to understand where blocking calls are happening.

davidbrochart commented 2 years ago

Another concern that I have is the presence of blocking calls in the server or server extensions.

It could be interesting to compare with Jupyverse. In particular, the file ID service will be fully async, while jupyter-server-fileid is not async.

davidbrochart commented 2 years ago

Yet another alternative to @SylvainCorlay's proposal is to just create background tasks for processing messages, to update our internal state and to broadcast to clients. They would be like "fire and forget" tasks. Indeed, the idea behind putting the updates in a queue before processing them seems to be that they have to be processed in order, but one of the characteristics of CRDTs is that they can be processed out of order.

davidbrochart commented 1 year ago

I made these changes part of #38.

davidbrochart commented 1 year ago

In @SylvainCorlay's proposal, a task consumes the _broadcast_queue and forwards the updates to the peers, I suppose sequentially? If so, then we have the same potential issue of one slow client slowing down the whole broadcast. To solve that, we would need to forward the updates to the peers in individual tasks, which is what I did in #38. Let me know if I'm missing something.

SylvainCorlay commented 1 year ago

Would Websocket.send block the processing of the broadcast queue for a slow client?

davidbrochart commented 1 year ago

ypy-websocket is agnostic to the WebSocket implementation. In jupyverse we use websockets, where send is async, which suggests that it's not that fast, maybe checking for connection and eventually timing out. So I don't think queuing calls to send is a good idea. In jupyter-server we use Tornado's WebSocket implementation, where write_message is not async. I don't know if they use threads under the hood. In both implementation, I don't think it's safe to assume that sending data on a WebSocket is fast.

SylvainCorlay commented 1 year ago

In jupyverse we use websockets, where send is async, which suggests that it's not that fast, maybe checking for connection and eventually timing out.

OK, I did not know that send could potentially be slow. Thanks for the explanation.

y-crdt / ypy-websocket

Running update concurrently #39