matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

Account data is not in sync in replication workers #9360

Closed davidfant closed 3 years ago

davidfant commented 3 years ago

Description

I'm having issues with replication and keeping account data in sync between workers on a Synapse cluster. It seems like the different replicas are out of sync. For example, when the client calls /sync, account data (more specifically m.direct) returns different values for different requests. The values stored in the database aren't always returned; sometimes it returns old values. Database load is not a problem, so I'm guessing that the workers somehow have different cached data. I have tried disabling caching, but to no avail.

Steps to reproduce

  1. I've replicated using workers and have followed this setup: https://github.com/matrix-org/synapse/blob/develop/docs/workers.md#shared-configuration
  2. Call /sync for a user

This is happening consistently on a prod server with a few hundred people using it. I am not able to reproduce locally when only I am testing around.

Expected results: /sync should always return the same data for the same request (when called just after reach other)

Actual results: /sync returns different data (sometimes fresh from the DB, sometimes not) depending on what worker it hits. Each worker consistently returns the same response. For some of the workers it returns what's stored in DB while for some it's an old value http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=0

Version information

Docker compose file: https://gist.github.com/davidfant/594b91cc3dd2d9fda9225849edd971c3

richvdh commented 3 years ago

This is odd. Silly questions, but have you checked that all the workers are able to connect to redis? Are they successfully receiving replication data (there should be some clues in the logs about it)?

erikjohnston commented 3 years ago

We haven't seen this on any of our deployments, so I'm going to close this. Feel free to reopen if you are still having issues and can provide some logs :slightly_smiling_face: