Open turt2live opened 1 year ago
[synchrotron_1] 2023-08-17 21:50:00,271 - synapse.access.http.8050 - 465 - INFO - GET-1- 172.18.0.3 - 8050 - {@travis:t2l.io} Processed request: 646.819sec/-345.729sec (269.006sec, 42.309sec) (133.925sec/1309.064sec/34008) 0B 200! "GET /_matrix/client/r0/sync?timeout=0&filter=%7B%22room%22%3A%7B%22timeline%22%3A%7B%22limit%22%3A1%7D%7D%7D HTTP/1.0" "sync-v3-proxy-0.99.4" [1093305 dbevts]
final tally for memory usage was 11gb.
Sounds like a Synapse bug to me.
https://github.com/matrix-org/sliding-sync/blob/3bf3f2305373d69d6346d38b70be8add2a53f029/sync2/client.go#L97-L132 is the logic for generating a sync URL.
We use a filter for two purposes:
Using a pre-uploaded filter might yield better results, as I believe Synapse does the filtering in-database when using a pre-uploaded filter.
I can't see any difference in the filtering logic for these two cases: https://github.com/matrix-org/synapse/blob/54317d34b76adb1e8f694acd91f631b3abe38947/synapse/rest/client/sync.py#L166-L187
from the sliding sync internal room, a realization: the filter sliding sync uses does not lazy load room members, while Element Desktop will. This almost certainly explains the 11gb of memory required to process the initial sync.
If it's not strictly required to have all the member events, I'd suggest the proxy aggressively lazy load members.
The proxy needs the member events at every event in order to locally calculate history visibility. E.g consider:
The proxy was not designed to handle partial room state, and adding that in would be a significant, risky and costly change.
The scenario above is mitigated somewhat because of the cache invalidation work, coupled with https://github.com/matrix-org/sliding-sync/pull/366 - the proxy tries really hard NOT to do history visibility checks so it will cut off serving events up to the user's join event.
There's still numerous pitfalls:
required_state: [["m.room.member","*"]]
needs the entire member list to serve up the response, and clients need this for E2EE.state
block has new events) but we won't touch the timeline, meaning from the proxy's pov Bob will not be able to see ANY events in the lazy loaded room, as there exists no timeline events with Bob as a joined user. We ultimately need the entire member list. Synapse ideally should stream the list back if it's too large.
To further emphasise why we cannot using Synapse lazy loading: it's not even accurate. See https://github.com/element-hq/synapse/issues/17050 and related issues.
I set up a brand new sliding sync proxy to test Element Android X, and when it actually started the poller for the first time it slowly ate all 7.5gb of memory I was able to give the synapse synchrotron worker, eventually causing OOM issues.
For comparison, an initial sync for my account on Element Desktop only uses 2-3gb of Synapse's synchrotron.
I suspect this is related to the use of an inline filter on the initial sync, but haven't confirmed. Using a pre-uploaded filter might yield better results, as I believe Synapse does the filtering in-database when using a pre-uploaded filter.