matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

federation: refresh devices_list for users more frequently #5433

Open cyphar opened 5 years ago

cyphar commented 5 years ago

Description:

Let's take two homeservers A.com and B.com. You've set up A.com and B.com and they are both federating with one another freely.

At the moment if you (possibly accidentally) nuke your already-federated A.com server and want to rebuild it using the same server_name things work mostly fine (B.com seems to accept the new signing key of your server without much issue, and federated events between A.com and B.com work properly).

However, one issue that occurs is that the cached device list of A.com's users on B.com will persist (assuming that you've kept the same usernames on the new installation). Not only that, but because the database has been completely nuked, A.com won't know that it needs to send m.device_list_update EDUs to B.com. This results in E2EE over federation being basically broken permanently because users on B.com won't negotiate new Megolm session keys (because they don't see the new A.com devices and instead only see the ghost ones.

Would it be acceptable to make synapse automatically do a hard refresh of the devices list of the users on a federated server if the signing key of the homeserver changes? This could be done lazily (effectively just remove the cached information about devices, and when a user requests it then we fetch it over federation). Is there any attack that I'm missing which much result if we make this a default feature?

Workaround

As a quick hack, it is possible to fix this using the manhole in synapse -- by sending a m.device_list_update EDU to B.com which has a broken value of prev_id. According to the spec this causes B.com to reset its cached device list:

If a server receives an EDU which refers to a prev_id it does not recognise, it must resynchronise its list by calling the /user/keys/query API and resume the process.

Manhole Script -- USE AT YOUR OWN RISK ```python broken_server = "B.com" user_id = "@cyphar:A.com" fs = hs.get_federation_sender() k = fs._per_destination_queues[broken_server]._store.get_devices_with_keys_by_user(user_id) content = {"user_id": user_id, "stream_id": k.result[0]+2, "prev_id": [k.result[0]+1], **k.result[1][0]} fs.build_and_send_edu(broken_server, "m.device_list_update", content) ```
cyphar commented 5 years ago

From #5095, one of the proposed solutions of re-syncing the devices list if we see an unknown device event would solve this problem too (though at the risk of nuking the device list cache more frequently).