Open deepbluev7 opened 3 years ago
The count of OTKs retrieved by /sync
is in fact cached at the storage layer. However, the same exact cache is also used for the /keys/upload/{deviceId}
endpoint.
Are you making use of Synapse worker mode? Perhaps the cache is different across multiple workers.
I do indeed use synapse in worker mode. And sync is forwarded to a different worker than keys/upload
, which is intended as such if I understand the workers documentation correctly.
Indeed, that is a valid configuration. The storage method I'm talking about in particular is count_e2e_one_time_keys
, which does look to be correctly invalidated when keys are added/deleted, and those changes should be streamed out to other workers. So perhaps this is a red herring.
Incremental /sync
responses themselves are cached if the same since_token
is provided by the client. It sounds somewhat farfetched, but it is possible that your client was sending the same since_token
for /sync
multiple times? Do you have server and/or client logs of the network activity to check?
I don't have any logs for that time period anymore, but this happened across multiple restarts in between of which I received messages. /sync
returned the same value after calling upload
without any actual keys multiple times (to query the key count on startup to workaround older servers not sending a keycount, if it is 0). So keys weren't added between the different sync and upload calls (until I changed something in our code).
One possible cause I could think of, was that I sent a message in a large, encrypted room. I got around 20 key requests at least at the same time and started uploading keys after the first received ones. So there could potentially have been a race there, where the invalidation happened, a new count was calculated, the invalidation was cleared, but in the mean time the count changed again.
I haven't heard of many races involving cache invalidation before in Synapse, but it's certainly a possibility.
I think the first step towards solving this would be for someone to replicate the behaviour in either a Synapse unit test or via Sytest or Complement test. Given it's potentially a race (and possibly one between workers), Sytest or Complement would probably be best for having the behaviour emerge. Note that worker support in Complement is still in progress.
Description
I noticed that after sending an encrypted in a large E2EE room (#e2ee-test:grin.hu) my key count responses got out of sync (might have happened earlier). On sync I got a count of 55, while when doing an empty key upload request returned a count of 33. Does synapse cache the key count for /sync somehow and that got out of sync?
After uploading some keys, the counts synchronized themselves to 50, so I am assuming the ones in /sync were wrong.
Steps to reproduce
Version information
If not matrix.org:
Version: 1.42.0rc1
Install method: ebuild
Platform: Gentoo