`/keys/claim` is surprisingly slow

richvdh commented 1 year ago

/keys/claim requests often take multiple seconds when requesting keys for hundreds of devices.

Out of interest I looked at the anatomy of a slow /keys/claim request (https://jaeger.proxy.matrix.org/trace/62603ae20c639720). The request took 6.2 seconds altogether.

In this case, we were just attempting to claim keys for devices which we had previously failed to get one. (Due to https://github.com/matrix-org/matrix-rust-sdk/issues/281, we do this a bit too often). Anyway the point is pretty much all of the devices in this request have run out of OTKs - but I think it is still instructive.

What I see is:

321 calls to db.claim_e2e_one_time_keys. This is presumably one for each device for matrix.org users. These take us to about 1.8 seconds.
321 calls to db._get_fallback_key. Again one for each matrix.org device. Another 2.1 seconds, bringing us to 4.0 seconds.
21 calls to claim_client_keys. One per federated destination. These all happen in parallel, so the critical path is the slowest homeserver to respond. The pathological case here is servers that respond within the timeout (so don't get backed off from) but slowly - and then the device doesn't have any keys so we have to do it again. In this case the slowest server was 2.1 seconds.

What I see here is some easy performance improvements. In particular:

Doing remote and local claims in parallel would roughly halve the time.
- [ ] up for grabs!
Doing more than one local device per db request would mean much less DB scheduling overhead.
- [x] #16565
- [x] #16570

DMRobertson commented 1 year ago

321 calls to db.claim_e2e_one_time_keys. This is presumably one for each device for matrix.org users. These take us to about 1.8 seconds.

I spent some time looking at this. On matrix.org I have ~100 keys across 2 devices:

matrix=> select count(*), device_Id from e2e_one_time_keys_json where user_id = '@dmrobertson:matrix.org' group by device_id;
 count │ device_id  
═══════╪════════════
    51 │ CNVWKSCNBO
    50 │ SNGLZABLEG

Here is an attempt to select 2 keys from the first device and 4 keys from the second in a bulk-query form.

matrix=> WITH claims(user_id, device_id, algorithm, count) AS (
    VALUES ('@dmrobertson:matrix.org', 'CNVWKSCNBO', 'signed_curve25519', 4),
           ('@dmrobertson:matrix.org', 'SNGLZABLEG', 'signed_curve25519', 2)
), ranked_keys AS (
    SELECT user_id,
           device_id,
           algorithm,
           key_id,
           claims.count,
           ROW_NUMBER() OVER (PARTITION BY (user_id, device_id, algorithm)) AS r
    FROM e2e_one_time_keys_json
        JOIN claims USING (user_id, device_id, algorithm)
)
SELECT
    k.user_id,
    k.device_id,
    k.algorithm
    -- , k.key_id, k.key_json omitted
FROM e2e_one_time_keys_json k
JOIN ranked_keys USING (user_id, device_id, algorithm, key_id)
WHERE ranked_keys.r <= ranked_keys.count;
         user_id         │ device_id  │     algorithm     
═════════════════════════╪════════════╪═══════════════════
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
 @dmrobertson:matrix.org │ SNGLZABLEG │ signed_curve25519
 @dmrobertson:matrix.org │ SNGLZABLEG │ signed_curve25519
(6 rows)

If we can change that SELECT into a DELETE FROM... RETURNING then I think we should be able to batch this up and call execute_values.

No idea if this works on SQLite, but I'm not interested in optimising the perf of the thing we say isn't suitable for production.

DMRobertson commented 1 year ago

If we can change that SELECT into a DELETE FROM... RETURNING then I think we should be able to batch this up and call execute_values.

https://gist.github.com/DMRobertson/243121754aed82eff56fa8ec5181184a is my attempt to test this locally. It seems very promising.

DMRobertson commented 1 year ago

We've got some metrics for this. Future me: on m.org, look at the encryption workers and search for "OneTimeKeyServlet".

DMRobertson commented 1 year ago

Let's see how the current changes fare.

Next step here is:

Doing remote and local claims in parallel would roughly halve the time.

DMRobertson commented 1 year ago

https://jaeger.proxy.matrix.org/trace/0182832586d1ede2 is an example of a request to claim 17 keys that took ~700ms.

High-level timing:

Drilling down, the bulk queries for OTKs and fallback keys took 40ms and 8ms, respectively. The rest of the (local) time was dominated by cache invalidation over replication. I would speculate that we can do better by:

having a means to batch invalidate N keys in one go (#16613)
bumping the sequence (nextval()) and inserting into the stream table in one go. ATM they're two separate queries:

clokep commented 11 months ago

Definitely seems a bit better with batching of cache invalidation, not an apples-to-apples comparison, but still seems promising.

Before

After

DMRobertson commented 11 months ago

I had a look at the grafana metrics (p50, p90, p95) and I couldn't see much of an improvement, sadly. But perhaps that's to be expected: I assume that requests to claim many e2e keys are rare, so we're deep into the tails of the distribution.

If anyone has a way to reproduce this I'd be interested to see if there's any user-perceptible improvement. I think Rich said that logging into a new session of develop.element.io would suffice?

matrix-org / synapse

`/keys/claim` is surprisingly slow #16554

Before

After