matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.83k stars 2.12k forks source link

Remove old-style (direct) TCP replication support #11728

Open richvdh opened 2 years ago

richvdh commented 2 years ago

the old direct-to-master TCP replication protocol has been deprecated since Synapse 1.18.0 (nearly a year and a half, as of this writing).

We still have a bunch of code to support it (DirectTcpReplicationClientFactory etc), and it's still mentioned in various bits of documentation (in particular https://matrix-org.github.io/synapse/develop/tcp_replication.html needs an update). Having multiple ways to do it is confusing and hard to maintain, so we should remove it properly.

richvdh commented 2 years ago

( https://matrix-org.github.io/synapse/develop/tcp_replication.html claims "Currently all streams are written by a single process", which is no longer true. )

reivilibre commented 2 years ago

Notes from Patrick

  1. You have to deal w/ removing things from the config. Are these now errors to config, or things to just ignore? (Note that our upgrade guide says "Hey just leave the old config settings they'll be ignored if you used Redis!")
  2. There are worker docs to update.
  3. It is slightly intertwined with Redis stuff / just touches a lot of stuff.

Sounds to me like step 1 might be altering documentation to forget about TCP replication. We can then remove the then-undocumented TCP replication :innocent:.

clokep commented 2 years ago

A big stopper to this is that many of the tests still assume TCP replication. I took a look at this and it isn't easy to untangle. 😢

There's a BaseStreamTestCase and a BaseMultiWorkerStreamTestCase, they have a lot of duplicate code between them but I think only the second one actually uses redis. I got a decent ways there by copying and pasting some code around, but still had a handful of failures that I would probably need @erikjohnston to help me figure out.

clokep commented 2 years ago

More concrete steps of what to do for this:

  1. [ ] Update unit tests to not use the TCP replication classes (instead using the Redis ones). See BaseStreamTestCase. -- #13543
  2. [x] Update sytest so that the Synapse runners use Redis instead of TCP replication. -- matrix-org/sytest#1282
  3. [x] Update Synapse CI to always use Redis for workers. -- #13647
  4. [x] Remove the ability to configure TCP replication listeners. -- #13647
  5. [ ] Remove all the TCP replication protocol code.
  6. [x] Rename some things which refer to TCP replication but are also used by Redis. -- #12192
  7. [x] Update documentation. -- #13647
clokep commented 2 years ago

Update unit tests to not use the TCP replication classes (instead using the Redis ones). See BaseStreamTestCase.

@erikjohnston and I started some work on this in the clokep/test-redis branch. There are some inline notes of what needs to be done, but the gist is that reconnecting to Redis doesn't seem to be working properly and something is going wrong with our fake transports and such. 😢

JacksonChen666 commented 2 years ago

Documentation is still somehow outdated using options that seem to be related to TCP replication workers

clokep commented 2 years ago

Documentation is still somehow outdated using options that seem to be related to TCP replication workers

@JacksonChen666 Can you please be more specific about which part of that page is outdated? It is quite long.

JacksonChen666 commented 2 years ago

@clokep i might be completely wrong (now that i double checked), but the options with the names like worker_replication_host might be the outdated ones, but i think in this case i'm wrong (probably because of confusion with worker_replication_port in the upgrade notes)

clokep commented 2 years ago

@clokep i might be completely wrong (now that i double checked), but the options with the names like worker_replication_host might be the outdated ones, but i think in this case i'm wrong (probably because of confusion with worker_replication_port in the upgrade notes)

worker_replication_host is still needed -- it is used to make HTTP requests back to the main process. worker_replication_port is no longer used (as the upgrade notes say). It is confusing -- this is the first step towards making this all a bit easier!

DMRobertson commented 2 years ago

All of the steps in https://github.com/matrix-org/synapse/issues/11728#issuecomment-1041454945 are done---I think we can close this now?

Edit: oh, maybe not. This one doesn't have an associated PR

Remove all the TCP replication protocol code.