Open richvdh opened 1 year ago
321 calls to
db.claim_e2e_one_time_keys
. This is presumably one for each device formatrix.org
users. These take us to about 1.8 seconds.
I spent some time looking at this. On matrix.org I have ~100 keys across 2 devices:
matrix=> select count(*), device_Id from e2e_one_time_keys_json where user_id = '@dmrobertson:matrix.org' group by device_id;
count │ device_id
═══════╪════════════
51 │ CNVWKSCNBO
50 │ SNGLZABLEG
Here is an attempt to select 2 keys from the first device and 4 keys from the second in a bulk-query form.
matrix=> WITH claims(user_id, device_id, algorithm, count) AS (
VALUES ('@dmrobertson:matrix.org', 'CNVWKSCNBO', 'signed_curve25519', 4),
('@dmrobertson:matrix.org', 'SNGLZABLEG', 'signed_curve25519', 2)
), ranked_keys AS (
SELECT user_id,
device_id,
algorithm,
key_id,
claims.count,
ROW_NUMBER() OVER (PARTITION BY (user_id, device_id, algorithm)) AS r
FROM e2e_one_time_keys_json
JOIN claims USING (user_id, device_id, algorithm)
)
SELECT
k.user_id,
k.device_id,
k.algorithm
-- , k.key_id, k.key_json omitted
FROM e2e_one_time_keys_json k
JOIN ranked_keys USING (user_id, device_id, algorithm, key_id)
WHERE ranked_keys.r <= ranked_keys.count;
user_id │ device_id │ algorithm
═════════════════════════╪════════════╪═══════════════════
@dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
@dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
@dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
@dmrobertson:matrix.org │ CNVWKSCNBO │ signed_curve25519
@dmrobertson:matrix.org │ SNGLZABLEG │ signed_curve25519
@dmrobertson:matrix.org │ SNGLZABLEG │ signed_curve25519
(6 rows)
If we can change that SELECT into a DELETE FROM... RETURNING then I think we should be able to batch this up and call execute_values
.
No idea if this works on SQLite, but I'm not interested in optimising the perf of the thing we say isn't suitable for production.
If we can change that SELECT into a DELETE FROM... RETURNING then I think we should be able to batch this up and call
execute_values
.
https://gist.github.com/DMRobertson/243121754aed82eff56fa8ec5181184a is my attempt to test this locally. It seems very promising.
We've got some metrics for this. Future me: on m.org, look at the encryption
workers and search for "OneTimeKeyServlet".
Let's see how the current changes fare.
Next step here is:
Doing remote and local claims in parallel would roughly halve the time.
https://jaeger.proxy.matrix.org/trace/0182832586d1ede2 is an example of a request to claim 17 keys that took ~700ms.
High-level timing:
Drilling down, the bulk queries for OTKs and fallback keys took 40ms and 8ms, respectively. The rest of the (local) time was dominated by cache invalidation over replication. I would speculate that we can do better by:
Definitely seems a bit better with batching of cache invalidation, not an apples-to-apples comparison, but still seems promising.
I had a look at the grafana metrics (p50, p90, p95) and I couldn't see much of an improvement, sadly. But perhaps that's to be expected: I assume that requests to claim many e2e keys are rare, so we're deep into the tails of the distribution.
If anyone has a way to reproduce this I'd be interested to see if there's any user-perceptible improvement. I think Rich said that logging into a new session of develop.element.io
would suffice?
/keys/claim
requests often take multiple seconds when requesting keys for hundreds of devices.Out of interest I looked at the anatomy of a slow
/keys/claim
request (https://jaeger.proxy.matrix.org/trace/62603ae20c639720). The request took 6.2 seconds altogether.In this case, we were just attempting to claim keys for devices which we had previously failed to get one. (Due to https://github.com/matrix-org/matrix-rust-sdk/issues/281, we do this a bit too often). Anyway the point is pretty much all of the devices in this request have run out of OTKs - but I think it is still instructive.
What I see is:
db.claim_e2e_one_time_keys
. This is presumably one for each device formatrix.org
users. These take us to about 1.8 seconds.db._get_fallback_key
. Again one for eachmatrix.org
device. Another 2.1 seconds, bringing us to 4.0 seconds.claim_client_keys
. One per federated destination. These all happen in parallel, so the critical path is the slowest homeserver to respond. The pathological case here is servers that respond within the timeout (so don't get backed off from) but slowly - and then the device doesn't have any keys so we have to do it again. In this case the slowest server was 2.1 seconds.What I see here is some easy performance improvements. In particular: