matrix-org / sytest

Black-box integration testing for Matrix homeservers
Apache License 2.0
73 stars 55 forks source link

`Device list doesn't change if remote server is down` is flaky #1136

Open clokep opened 3 years ago

clokep commented 3 years ago

https://github.com/matrix-org/synapse/runs/3570688017

I think this was just re-enabled?

clokep commented 3 years ago

And https://github.com/matrix-org/synapse/runs/3587797065

clokep commented 3 years ago

And https://github.com/matrix-org/synapse/runs/3588399409

DMRobertson commented 3 years ago

Well, I mean to make a PR for this but screwed up my git branch management locally.

Let's see how this goes?

H-Shay commented 3 years ago

Just noting that I had a failure on this test as well today (with several re-runs)

DMRobertson commented 3 years ago

I resorted to sticking debug statements into synapse. My understanding: on the first keys query,

All normal operation as far as I can see. On the second keys query,

Guessing there's a race between the cache invalidation and the second device keys call?? Commenting out the @cachedList call makes the test pass (with user_map={'@__ANON__-0:localhost:34477': '3'}).

Notes:

erikjohnston commented 3 years ago

Oh bleurgh, that's annoying.

The other way we deal with this sort of thing is by using retry_until_success { ... } to wrap the test code, which will repeatedly run it until it passes (with some appropriate exponential backoff). The downside with this approach is that failures will result in the test timing out, rather than in a more helpful manner, so it's important to add helpful logging. An example usage:

https://github.com/matrix-org/sytest/blob/66cace301feceeb704c5502aea84776994d19123/tests/10apidoc/33room-members.pl#L107-L120

So I think we can wrap the second call to fetch keys in that and it should work?

DMRobertson commented 3 years ago

Thanks @erikjohnston, that sounds like a good avenue of attack.

For completeness, I wanted to demonstrate the race in the logs. Here's a version where I've made sytest sleep 1; between key requests.

Logs with extra sytest sleep ``` 2021-09-15 18:33:44 ✔ $ rg "/keys|DMR" /tmp/logs/server-* --color=always | sort --key 2,4 -t : -s | less /tmp/logs/server-0/synchrotron.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-100 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/event_persister2.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-100 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/event_persister1.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-132 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/appservice.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-148 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/frontend_proxy.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-164 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/background_worker.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-180 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/federation_sender.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-196 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/pusher.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-212 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/user_dir.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-228 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/media_repository.log:2021-09-15 17:32:21,167 - root - 70 - WARNING - process-replication-data-68 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,168 - root - 70 - WARNING - process-replication-data-20 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/federation_reader.log:2021-09-15 17:32:21,170 - root - 70 - WARNING - process-replication-data-36 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/client_reader.log:2021-09-15 17:32:21,170 - root - 70 - WARNING - process-replication-data-52 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/event_creator.log:2021-09-15 17:32:21,171 - root - 70 - WARNING - process-replication-data-132 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_173213-1:localhost:8840') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,778 - synapse.handlers.e2e_keys - 117 - WARNING - POST-0 - DMR: query_devices /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,778 - synapse.handlers.e2e_keys - 137 - WARNING - POST-0 - DMR: local_query={} remote_queries={'@__ANON__-0:localhost:35035': []} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,782 - synapse.storage.databases.main.devices - 474 - WARNING - POST-0 - DMR: user_map={'@__ANON__-0:localhost:35035': None} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,782 - synapse.storage.databases.main.devices - 481 - WARNING - POST-0 - DMR: users_needing_resync=set() /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,783 - synapse.handlers.e2e_keys - 171 - WARNING - POST-0 - DMR: user_ids_not_in_cache={'@__ANON__-0:localhost:35035'} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,783 - synapse.handlers.e2e_keys - 227 - WARNING - POST-0 - DMR: fetch user_id='@__ANON__-0:localhost:35035', device_list=[] /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,783 - synapse.handlers.e2e_keys - 231 - WARNING - POST-0 - DMR: AAAAA /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,784 - synapse.handlers.e2e_keys - 236 - WARNING - POST-0 - DMR: BBBBB room_ids=frozenset({'!MYFsKlpvDpfGpmYNQU:localhost:8840'}) /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,784 - synapse.handlers.e2e_keys - 240 - WARNING - POST-0 - DMR: CCCCC /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,784 - synapse.handlers.e2e_keys - 251 - WARNING - POST-0 - DMR: slave /tmp/logs/server-0/homeserver.log:2021-09-15 17:32:21,791 - synapse.handlers.device - 844 - WARNING - POST-10 - DMR: user_device_resync for user_id='@__ANON__-0:localhost:35035' /tmp/logs/server-0/homeserver.log:2021-09-15 17:32:21,804 - synapse.handlers.device - 895 - WARNING - POST-10 - DMR: result={'user_id': '@__ANON__-0:localhost:35035', 'devices': [{'device_display_name': 'Curiosity Rover', 'keys': {'signatures': {'@__ANON__-0:localhost:35035': {'ed2 5519:JLAFKJWSCS': 'dSO80A01XiigH3uBiDVx/EjzaoycHcjq9lfQX0uWsqxl2giMIiSPR8a4d291W1ihKJL/a+myXS367WT6NAIcBA'}}, 'device_id': 'CURIOSITY_ROVER', 'keys': {'b': 'flat', 'c': 'sharp'}, 'algorithms': ['fast', 'and broken'], 'user_id': '@__ANON__-0:localhost:35035'}, 'device_id': 'CURIOSIT Y_ROVER'}], 'stream_id': 3} /tmp/logs/server-0/homeserver.log:2021-09-15 17:32:21,806 - synapse.handlers.device - 945 - WARNING - POST-10 - DMR: ignore_devices=False /tmp/logs/server-0/homeserver.log:2021-09-15 17:32:21,806 - synapse.storage.databases.main.devices - 1261 - WARNING - POST-10 - DMR: user_id='@__ANON__-0:localhost:35035' devices=[{'device_display_name': 'Curiosity Rover', 'keys': {'signatures': {'@__ANON__-0:localhost:35035': {'ed 25519:JLAFKJWSCS': 'dSO80A01XiigH3uBiDVx/EjzaoycHcjq9lfQX0uWsqxl2giMIiSPR8a4d291W1ihKJL/a+myXS367WT6NAIcBA'}}, 'device_id': 'CURIOSITY_ROVER', 'keys': {'b': 'flat', 'c': 'sharp'}, 'algorithms': ['fast', 'and broken'], 'user_id': '@__ANON__-0:localhost:35035'}, 'device_id': 'CURIOSI TY_ROVER'}] stream_id=3 /tmp/logs/server-0/homeserver.log:2021-09-15 17:32:21,807 - synapse.storage.databases.main.devices - 1306 - WARNING - POST-10 - DMR: extremities user_id='@__ANON__-0:localhost:35035' stream_id=3 /tmp/logs/server-0/homeserver.log:2021-09-15 17:32:21,809 - synapse.handlers.device - 953 - WARNING - POST-10 - DMR: finished marking cache as valid /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,815 - synapse.access.http.8836 - 389 - INFO - POST-0 - 127.0.0.1 - 8836 - {@anon-20210915_173213-1:localhost:8840} Processed request: 0.039sec/-0.000sec (0.008sec, 0.000sec) (0.001sec/0.003sec/6) 440B 200 "POST /_matrix/clien t/r0/keys/query?access_token= HTTP/1.1" "Perl + Net::Async::HTTP/0.48" [0 dbevts] /tmp/logs/server-0/event_persister1.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-152 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/appservice.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-168 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/frontend_proxy.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-184 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/background_worker.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-200 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/federation_sender.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-216 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/pusher.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-232 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/user_dir.log:2021-09-15 17:32:21,922 - root - 70 - WARNING - process-replication-data-248 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/synchrotron.log:2021-09-15 17:32:21,923 - root - 70 - WARNING - process-replication-data-120 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/event_persister2.log:2021-09-15 17:32:21,923 - root - 70 - WARNING - process-replication-data-120 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/event_creator.log:2021-09-15 17:32:21,923 - root - 70 - WARNING - process-replication-data-152 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/client_reader.log:2021-09-15 17:32:21,923 - root - 70 - WARNING - process-replication-data-72 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/media_repository.log:2021-09-15 17:32:21,923 - root - 70 - WARNING - process-replication-data-88 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,924 - root - 70 - WARNING - process-replication-data-40 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/federation_reader.log:2021-09-15 17:32:21,924 - root - 70 - WARNING - process-replication-data-56 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,825 - synapse.handlers.e2e_keys - 117 - WARNING - POST-1 - DMR: query_devices /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,826 - synapse.handlers.e2e_keys - 137 - WARNING - POST-1 - DMR: local_query={} remote_queries={'@__ANON__-0:localhost:35035': []} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,833 - synapse.storage.databases.main.devices - 474 - WARNING - POST-1 - DMR: user_map={'@__ANON__-0:localhost:35035': '3'} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,836 - synapse.storage.databases.main.devices - 481 - WARNING - POST-1 - DMR: users_needing_resync=set() /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,840 - synapse.handlers.e2e_keys - 171 - WARNING - POST-1 - DMR: user_ids_not_in_cache=set() /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,842 - synapse.access.http.8836 - 389 - INFO - POST-1 - 127.0.0.1 - 8836 - {@anon-20210915_173213-1:localhost:8840} Processed request: 0.017sec/-0.000sec (0.011sec, 0.000sec) (0.003sec/0.004sec/4) 493B 200 "POST /_matrix/clien t/r0/keys/query?access_token= HTTP/1.1" "Perl + Net::Async::HTTP/0.48" [0 dbevts] ```

In particular, note that the stream_writer is told over replication about the device lists having changed:

  /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:21,924 - root - 70 - WARNING - process-replication-data-40 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@__ANON__-0:localhost:35035')

So we get a correct cache hit:

  /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,826 - synapse.handlers.e2e_keys - 137 - WARNING - POST-1 - DMR: local_query={} remote_queries={'@__ANON__-0:localhost:35035': []}
  /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,833 - synapse.storage.databases.main.devices - 474 - WARNING - POST-1 - DMR: user_map={'@__ANON__-0:localhost:35035': '3'}
  /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,836 - synapse.storage.databases.main.devices - 481 - WARNING - POST-1 - DMR: users_needing_resync=set()
  /tmp/logs/server-0/stream_writer.log:2021-09-15 17:32:22,840 - synapse.handlers.e2e_keys - 171 - WARNING - POST-1 - DMR: user_ids_not_in_cache=set()

Here's the version without the sleep.

Without sleep ``` /tmp/logs/server-0/frontend_proxy.log:2021-09-15 17:45:20,277 - root - 70 - WARNING - process-replication-data-161 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/synchrotron.log:2021-09-15 17:45:20,277 - root - 70 - WARNING - process-replication-data-177 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/media_repository.log:2021-09-15 17:45:20,277 - root - 70 - WARNING - process-replication-data-193 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/event_creator.log:2021-09-15 17:45:20,277 - root - 70 - WARNING - process-replication-data-209 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/federation_sender.log:2021-09-15 17:45:20,277 - root - 70 - WARNING - process-replication-data-225 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/appservice.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-113 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/event_persister2.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-113 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/background_worker.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-129 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/pusher.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-145 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/event_persister1.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-17 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-65 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/federation_reader.log:2021-09-15 17:45:20,278 - root - 70 - WARNING - process-replication-data-81 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/user_dir.log:2021-09-15 17:45:20,282 - root - 70 - WARNING - process-replication-data-33 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/client_reader.log:2021-09-15 17:45:20,283 - root - 70 - WARNING - process-replication-data-49 - DMR: row=DeviceListsStream.DeviceListsStreamRow(entity='@anon-20210915_174513-1:localhost:8840') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,071 - synapse.handlers.e2e_keys - 117 - WARNING - POST-0 - DMR: query_devices /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,071 - synapse.handlers.e2e_keys - 137 - WARNING - POST-0 - DMR: local_query={} remote_queries={'@__ANON__-0:localhost:33603': []} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,074 - synapse.storage.databases.main.devices - 474 - WARNING - POST-0 - DMR: user_map={'@__ANON__-0:localhost:33603': None} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,075 - synapse.handlers.e2e_keys - 171 - WARNING - POST-0 - DMR: user_ids_not_in_cache={'@__ANON__-0:localhost:33603'} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,075 - synapse.handlers.e2e_keys - 227 - WARNING - POST-0 - DMR: fetch user_id='@__ANON__-0:localhost:33603', device_list=[] /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,075 - synapse.handlers.e2e_keys - 231 - WARNING - POST-0 - DMR: AAAAA /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,075 - synapse.storage.databases.main.devices - 481 - WARNING - POST-0 - DMR: users_needing_resync=set() /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,077 - synapse.handlers.e2e_keys - 236 - WARNING - POST-0 - DMR: BBBBB room_ids=frozenset({'!PSQPVgzGgcyCSphXvA:localhost:8840'}) /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,077 - synapse.handlers.e2e_keys - 240 - WARNING - POST-0 - DMR: CCCCC /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,077 - synapse.handlers.e2e_keys - 251 - WARNING - POST-0 - DMR: slave /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,083 - synapse.handlers.device - 844 - WARNING - POST-10 - DMR: user_device_resync for user_id='@__ANON__-0:localhost:33603' /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,096 - synapse.handlers.device - 895 - WARNING - POST-10 - DMR: result={'user_id': '@__ANON__-0:localhost:33603', 'stream_id': 3, 'devices': [{'device_display_name': 'Curiosity Rover', 'device_id': 'CURIOSITY_ROVER', 'keys': {'de vice_id': 'CURIOSITY_ROVER', 'keys': {'b': 'flat', 'c': 'sharp'}, 'signatures': {'@__ANON__-0:localhost:33603': {'ed25519:JLAFKJWSCS': 'dSO80A01XiigH3uBiDVx/EjzaoycHcjq9lfQX0uWsqxl2giMIiSPR8a4d291W1ihKJL/a+myXS367WT6NAIcBA'}}, 'user_id': '@__ANON__-0:localhost:33603', 'algorithms': ['fast', 'and broken']}}]} /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,098 - synapse.handlers.device - 945 - WARNING - POST-10 - DMR: ignore_devices=False /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,098 - synapse.storage.databases.main.devices - 1261 - WARNING - POST-10 - DMR: user_id='@__ANON__-0:localhost:33603' devices=[{'device_display_name': 'Curiosity Rover', 'device_id': 'CURIOSITY_ROVER', 'keys': {'device_id': 'CURI OSITY_ROVER', 'keys': {'b': 'flat', 'c': 'sharp'}, 'signatures': {'@__ANON__-0:localhost:33603': {'ed25519:JLAFKJWSCS': 'dSO80A01XiigH3uBiDVx/EjzaoycHcjq9lfQX0uWsqxl2giMIiSPR8a4d291W1ihKJL/a+myXS367WT6NAIcBA'}}, 'user_id': '@__ANON__-0:localhost:33603', 'algorithms': ['fast', 'and broken']}}] stream_id=3 /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,100 - synapse.storage.databases.main.devices - 1306 - WARNING - POST-10 - DMR: extremities user_id='@__ANON__-0:localhost:33603' stream_id=3 /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,101 - synapse.handlers.device - 953 - WARNING - POST-10 - DMR: finished marking cache as valid /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,108 - synapse.access.http.8836 - 389 - INFO - POST-0 - 127.0.0.1 - 8836 - {@anon-20210915_174513-1:localhost:8840} Processed request: 0.042sec/-0.000sec (0.008sec, 0.000sec) (0.004sec/0.004sec/6) 440B 200 "POST /_matrix/clien t/r0/keys/query?access_token= HTTP/1.1" "Perl + Net::Async::HTTP/0.48" [0 dbevts] /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,112 - synapse.handlers.e2e_keys - 117 - WARNING - POST-1 - DMR: query_devices /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,112 - synapse.handlers.e2e_keys - 137 - WARNING - POST-1 - DMR: local_query={} remote_queries={'@__ANON__-0:localhost:33603': []} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,113 - synapse.storage.databases.main.devices - 474 - WARNING - POST-1 - DMR: user_map={'@__ANON__-0:localhost:33603': None} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.handlers.e2e_keys - 171 - WARNING - POST-1 - DMR: user_ids_not_in_cache={'@__ANON__-0:localhost:33603'} /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.handlers.e2e_keys - 227 - WARNING - POST-1 - DMR: fetch user_id='@__ANON__-0:localhost:33603', device_list=[] /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.handlers.e2e_keys - 231 - WARNING - POST-1 - DMR: AAAAA /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.handlers.e2e_keys - 236 - WARNING - POST-1 - DMR: BBBBB room_ids=frozenset({'!PSQPVgzGgcyCSphXvA:localhost:8840'}) /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.handlers.e2e_keys - 240 - WARNING - POST-1 - DMR: CCCCC /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.storage.databases.main.devices - 481 - WARNING - POST-1 - DMR: users_needing_resync=set() /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,115 - synapse.handlers.e2e_keys - 251 - WARNING - POST-1 - DMR: slave /tmp/logs/server-0/homeserver.log:2021-09-15 17:45:21,116 - synapse.handlers.device - 844 - WARNING - POST-12 - DMR: user_device_resync for user_id='@__ANON__-0:localhost:33603' /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,132 - synapse.http.matrixfederationclient - 670 - WARNING - POST-1 - {POST-O-1} [localhost:33603] Request failed: POST matrix://localhost:33603/_matrix/federation/v1/user/keys/query: HttpResponseException('400: Bad Request') /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,133 - synapse.access.http.8836 - 389 - INFO - POST-1 - 127.0.0.1 - 8836 - {@anon-20210915_174513-1:localhost:8840} Processed request: 0.021sec/-0.000sec (0.004sec, 0.000sec) (0.001sec/0.001sec/3) 156B 200 "POST /_matrix/clien t/r0/keys/query?access_token= HTTP/1.1" "Perl + Net::Async::HTTP/0.48" [0 dbevts] ```

The stream_writer never gets the replication message (I'm assuming we kill the server before it has chance to arrive.) So we get an old (incorrect) cache hit:

 /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,113 - synapse.storage.databases.main.devices - 474 - WARNING - POST-1 - DMR: user_map={'@__ANON__-0:localhost:33603': None}
  /tmp/logs/server-0/stream_writer.log:2021-09-15 17:45:21,114 - synapse.handlers.e2e_keys - 171 - WARNING - POST-1 - DMR: user_ids_not_in_cache={'@__ANON__-0:localhost:33603'}
DMRobertson commented 3 years ago

Unfortunately I don't think retry_until_success works either. We end up with a different kind of race!

DMRobertson commented 3 years ago

Also https://github.com/matrix-org/synapse/runs/3630907641, but I think that's running against a release branch in sytest that doesn't have the above PR (merged to sytest develop)

DMRobertson commented 3 years ago

Also https://github.com/matrix-org/sytest/pull/1132/checks?check_run_id=3624153208 .

DMRobertson commented 2 years ago

I think https://github.com/matrix-org/synapse/runs/5752060971?check_suite_focus=true hit this.

DMRobertson commented 2 years ago

And https://github.com/matrix-org/synapse/runs/5971866118?check_suite_focus=true

DMRobertson commented 2 years ago

And https://github.com/matrix-org/synapse/runs/6094736905?check_suite_focus=true

DMRobertson commented 2 years ago

https://github.com/matrix-org/synapse/runs/6496065804?check_suite_focus=true

reivilibre commented 2 years ago

https://github.com/matrix-org/synapse/runs/6611461057?check_suite_focus=true

richvdh commented 2 years ago

https://github.com/matrix-org/synapse/runs/6809647658?check_suite_focus=true