Closed davidfant closed 3 years ago
This is odd. Silly questions, but have you checked that all the workers are able to connect to redis? Are they successfully receiving replication data (there should be some clues in the logs about it)?
We haven't seen this on any of our deployments, so I'm going to close this. Feel free to reopen if you are still having issues and can provide some logs :slightly_smiling_face:
Description
I'm having issues with replication and keeping account data in sync between workers on a Synapse cluster. It seems like the different replicas are out of sync. For example, when the client calls
/sync
, account data (more specifically m.direct) returns different values for different requests. The values stored in the database aren't always returned; sometimes it returns old values. Database load is not a problem, so I'm guessing that the workers somehow have different cached data. I have tried disabling caching, but to no avail.Steps to reproduce
/sync
for a userThis is happening consistently on a prod server with a few hundred people using it. I am not able to reproduce locally when only I am testing around.
Expected results:
/sync
should always return the same data for the same request (when called just after reach other)Actual results:
/sync
returns different data (sometimes fresh from the DB, sometimes not) depending on what worker it hits. Each worker consistently returns the same response. For some of the workers it returns what's stored in DB while for some it's an old value http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=0Version information
matrix.event.fant.io
Version: {"server_version":"1.24.0rc2","python_version":"3.8.7"}
Install method:
Platform:
Running 1 master and 7 workers on a
c2-standard-8 (8 vCPUs, 32 GB memory)
instance on GCP usingdocker-compose
Docker compose file: https://gist.github.com/davidfant/594b91cc3dd2d9fda9225849edd971c3