Open clokep opened 1 year ago
Can we avoid bouncing this to the main process? (Can we have a different worker handle these queries?)
It doesn't seem like this would help us much at the moment: during the moments that we're having trouble, there isn't actually much CPU usage by other requests at all — it seems to me that moving it to another worker would just move the problem rather than improve performance. (not that getting it off the master is a bad thing, but it won't solve requests taking forever for M).
We seem to have a gap in CPU measurement to be honest — at 100%+ CPU, the ReplicationUserDeviceResyncRestServlet
is only listed at about ~32%.
Per-block metrics only add up to about ~2% (I think we must be under-reporting how much time is spent following replication streams, to be honest, given empirically we think it accounts for at least 30% of CPU usage since an otherwise-quiet worker sits at about 30%+ from memory).
Can we avoid bouncing this to the main process? (Can we have a different worker handle these queries?)
It doesn't seem like this would help us much at the moment: during the moments that we're having trouble, there isn't actually much CPU usage by other requests at all — it seems to me that moving it to another worker would just move the problem rather than improve performance. (not that getting it off the master is a bad thing, but it won't solve requests taking forever for M).
Oh good catch! If nothing else is happening on the worker than I agree this wouldn't help much (unless we also let it be shard-able?)
unless we also let it be shard-able
yep, in alignment there. But sharding it could be trickier than the time we have before christmas, so good to bear in mind but also would be good to keep an eye out for alternatives..
Can we parallelize the per-user queries (or add a new endpoint to make them more efficient)?
This sounds like it'd be good. Currently we make one /user/devices/%s
federation request per user and I'm not sure there exist alternative endpoints which batch this up.
Batching up the replication requests into one is quite easy though.
Can we avoid bouncing this to the main process? (Can we have a different worker handle these queries?)
It doesn't seem like this would help us much at the moment: during the moments that we're having trouble, there isn't actually much CPU usage by other requests at all — it seems to me that moving it to another worker would just move the problem rather than improve performance. (not that getting it off the master is a bad thing, but it won't solve requests taking forever for M).
I wouldn't discount it so quickly. The master tends to be a bottleneck for everything, which means that (a) it pinning its CPU is particularly noticeable, and (b) doing lots of odd jobs rather than a few core ones tends to make it more prone to under-reporting of CPU usage.
We seem to have a gap in CPU measurement to be honest
Indeed. Generally anything low-level isn't well tracked; for example reading and parsing an HTTP request (we can't attribute that to a particular servlet, because we don't yet know which it will be).
Per-block metrics only add up to about ~2%
Per-block metrics aren't meant to add up to 100%. Most code in synapse isn't covered by one of the measurement blocks (so won't be counted by per-block metrics), and some measurement blocks are nested in others (so will be double-counted if you try to add them up).
(I think we must be under-reporting how much time is spent following replication streams, to be honest, given empirically we think it accounts for at least 30% of CPU usage since an otherwise-quiet worker sits at about 30%+ from memory).
Yes, if I remember rightly a lot of the low-level redis stuff isn't tracked in the cpu metrics.
This endpoint seems to do some funny things (and the code is hard to follow, but that doesn't seem to be a performance issue). Note that this endpoint is supported by workers (and goes the
client_reader
on matrix.org/Complement). SeeE2eKeysHandler.query_devices
for the entry point.E2eKeysHandler._query_devices_for_destination
./user/devices/{user_id}
federation endpoint./user/keys/query
once per destination.Note that Synapse will call the same resync logic for a remove users keys/devices due to a few different operations (receiving to-device messages from an unknown device, a
/keys/query
request in some situations, maybe other situations)./user/keys/query
is not sent to the main process.)