matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.83k stars 2.13k forks source link

`/keys/claim` is surprisingly slow #16554

Open richvdh opened 1 year ago

richvdh commented 1 year ago

/keys/claim requests often take multiple seconds when requesting keys for hundreds of devices.

Out of interest I looked at the anatomy of a slow /keys/claim request (https://jaeger.proxy.matrix.org/trace/62603ae20c639720). The request took 6.2 seconds altogether.

In this case, we were just attempting to claim keys for devices which we had previously failed to get one. (Due to https://github.com/matrix-org/matrix-rust-sdk/issues/281, we do this a bit too often). Anyway the point is pretty much all of the devices in this request have run out of OTKs - but I think it is still instructive.

What I see is:

What I see here is some easy performance improvements. In particular:

DMRobertson commented 1 year ago

321 calls to db.claim_e2e_one_time_keys. This is presumably one for each device for matrix.org users. These take us to about 1.8 seconds.

I spent some time looking at this. On matrix.org I have ~100 keys across 2 devices:

matrix=> select count(*), device_Id from e2e_one_time_keys_json where user_id = '@dmrobertson:matrix.org' group by device_id;
 count │ device_id  
═══════╪════════════
    51 │ CNVWKSCNBO
    50 │ SNGLZABLEG

Here is an attempt to select 2 keys from the first device and 4 keys from the second in a bulk-query form.

matrix=> WITH claims(user_id, device_id, algorithm, count) AS (
    VALUES ('@dmrobertson:matrix.org', 'CNVWKSCNBO', 'signed_curve25519', 4),
           ('@dmrobertson:matrix.org', 'SNGLZABLEG', 'signed_curve25519', 2)
), ranked_keys AS (
    SELECT user_id,
           device_id,
           algorithm,
           key_id,
           claims.count,
           ROW_NUMBER() OVER (PARTITION BY (user_id, device_id, algorithm)) AS r
    FROM e2e_one_time_keys_json
        JOIN claims USING (user_id, device_id, algorithm)
)
SELECT
    k.user_id,
    k.device_id,
    k.algorithm
    -- , k.key_id, k.key_json omitted
FROM e2e_one_time_keys_json k
JOIN ranked_keys USING (user_id, device_id, algorithm, key_id)
WHERE ranked_keys.r <= ranked_keys.count;
         user_id         │ device_id  │     algorithm     
═════════════════════════╪════════════╪═══════════════════
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ SNGLZABLEG │ signed_curve25519
 @dmrobertson:matrix.org │ SNGLZABLEG │ signed_curve25519
(6 rows)

If we can change that SELECT into a DELETE FROM... RETURNING then I think we should be able to batch this up and call execute_values.

No idea if this works on SQLite, but I'm not interested in optimising the perf of the thing we say isn't suitable for production.

DMRobertson commented 1 year ago

If we can change that SELECT into a DELETE FROM... RETURNING then I think we should be able to batch this up and call execute_values.

https://gist.github.com/DMRobertson/243121754aed82eff56fa8ec5181184a is my attempt to test this locally. It seems very promising.

DMRobertson commented 1 year ago

We've got some metrics for this. Future me: on m.org, look at the encryption workers and search for "OneTimeKeyServlet".

DMRobertson commented 1 year ago

Let's see how the current changes fare.

Next step here is:

Doing remote and local claims in parallel would roughly halve the time.

DMRobertson commented 1 year ago

https://jaeger.proxy.matrix.org/trace/0182832586d1ede2 is an example of a request to claim 17 keys that took ~700ms.

High-level timing: image

Drilling down, the bulk queries for OTKs and fallback keys took 40ms and 8ms, respectively. The rest of the (local) time was dominated by cache invalidation over replication. I would speculate that we can do better by:

clokep commented 11 months ago

Definitely seems a bit better with batching of cache invalidation, not an apples-to-apples comparison, but still seems promising.

Before

image

After

image

DMRobertson commented 11 months ago

I had a look at the grafana metrics (p50, p90, p95) and I couldn't see much of an improvement, sadly. But perhaps that's to be expected: I assume that requests to claim many e2e keys are rare, so we're deep into the tails of the distribution.

If anyone has a way to reproduce this I'd be interested to see if there's any user-perceptible improvement. I think Rich said that logging into a new session of develop.element.io would suffice?