Faster remote room joins: Support partial join re-syncing on workers other than the master

reivilibre commented 1 year ago

An enhancement of: #12994 (worker-mode support for Faster Remote Room Joins).

Instead of relying on the master to perform the re-syncing of the rooms, we should allow other workers to be involved. Part of the difficulty is in choosing a worker to perform the re-sync for a room, ensuring that even after a crash/restart, exactly one worker will pick up the job of re-syncing that room again. We should be mindful that in a hypothetical deployment, workers can be taken out of service — a room shouldn't be locked to one worker forever in case this happens, as that would mean the re-sync would never progress.

Aside: in future we should consider moving the /send_join request out of the master process. The obvious candidate is the "client reader" that receives the client-side /join request (and hence currently makes the request to ReplicationRemoteJoinRestServlet). The main thing to worry about then is locking (to ensure that we don't have multiple workers all trying to do the remote-join dance at once). For prior art in that department, we should look at the code that handles incoming events received over federation (https://github.com/matrix-org/synapse/blob/v1.69.0rc2/synapse/federation/federation_server.py#L1108-L1116), which uses a database row to hold a lock: we can simply call try_acquire_lock before starting a resync operation.

That still leaves us with the problem of making sure we resume the partial-state resync if the client reader that is currently processing it gets restarted (or, worse, turned off, never to return). Again following the example of incoming events: in that case, we kick off a processing job as soon as a worker discovers itself to be a "federation inbound" worker by receiving a /send request. Probably we could do the same here on a /_matrix/client/v3/rooms/.*/(send|join|invite|leave|ban|unban|kick) request? — https://github.com/matrix-org/synapse/issues/12994#issuecomment-1273604901

DMRobertson commented 1 year ago

Part of the difficulty is in choosing a worker to perform the re-sync for a room, ensuring that even after a crash/restart, exactly one worker will pick up the job of re-syncing that room again.

Can we piggyback off the sharding logic used for event persisters? (Is that sharded by room id?)

reivilibre commented 1 year ago

To some extent, but that means having a definitive list of workers which are nominated for the job. That's very possibly fine! (But just noting a consideration.)

erikjohnston commented 1 year ago

We can use the cross-worker locking stuff that we implemented for handling inbound federation:

https://github.com/matrix-org/synapse/blob/dfe8febe47bce48bb78bc5ea39d3c7f524d68177/synapse/storage/databases/main/lock.py#L103

erikjohnston commented 1 year ago

I think sharding the partial join stuff isn't something we need to worry about now TBH. We have a bunch of much busier streams that aren't sharded?

matrix-org / synapse

Faster remote room joins: Support partial join re-syncing on workers other than the master #14544