matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
186 stars 94 forks source link

Ability to migrate events between different formats. #932

Open ara4n opened 2 years ago

ara4n commented 2 years ago

One of the biggest problems in rapid Matrix development turns out to be confusion over the extent to which it's acceptable to ship experimental features in clients which generate events with prefixed event types/fields. Currently there is a concern that if one ships features with prefixed event types too widely, the immutable nature of events means that the experimental feature will become a de facto part of the Matrix standard, and clients will have to implement it for the rest of time in order to render older room history - much as org.matrix.custom.html has done.

However: perhaps we are being too constrained by the idea that matrix events are immutable. What if we provided a mechanism to migrate events from an old format to a new one during a room version upgrade? So that room version N+1 re-imports all the events from prior versions of the room, having reexpressed them (e.g. JSON->CBOR, or unprefixing a prefix, etc)? For instance, the server triggering the upgrade could go through re-importing all the old events (obviously at the expense of manipulating history, but that comes with the territory - plus folks would always be able to compare against the old room history to verify that the migration was not malicious). For E2EE, you could rely on the user triggering the upgrade to have all the keys, and have them similarly re-submit all the messages (again at the expense of transport consistency, and the ugliness of the client having to be online during the migration). The migration could also be done incrementally in the background via MSC2716.

This feels like a pretty useful thing to have in the protocol, especially as we gear up to more invasive changes such as changing event shapes in order to support account portability, P2P, or more efficient event encodings.

Thoughts welcome :)

(This would also be useful for migrating historical data between encryption formats - see also https://github.com/matrix-org/matrix-doc/issues/3520)

EDIT: Thinking further, this is almost a hard requirement whenever a crypto vulnerability emerges (e.g. post-quantum) which requires everyone to shift to a more secure record of their conversation history...

ShadowJonathan commented 2 years ago

This is, assuming that we have solved and compromised on the idea of who-imports-what in a historical room scenario, a good idea.

This would also require a robust and exhaustive history preservation mechanism, one which (imo) should try to query every server it knows about, to scrape every event in a room's history, to then have a comprehensive history of said room. (I'll comment here that I think adding a "waiver" to retrieve hidden room history on servers which have a user with upgrade permission could be worth considering)

However, I think it'll help everyone involved if this process were deterministic, so that (so to speak) every server can come to the same conversion of events, given the same room history.

This could get tricky, however, once custom events come into the mix, and a third party (in respect to Matrix and User) may have the same needs. In this case, their server implementation may emit different events for their custom format, which is undesirable, as it makes the server upgrade process more involved with "which server do we upgrade from?". This makes Matrix's position on dictating what events should be upgraded to what a privilege that third parties don't have, which is not ideal, and imo not in line with matrix's ideology.

So, my concerns are the following;

The whole process should be deterministic and "auditable", the previous room state could stay up for a while, and any server or concerned individual should be able to check the events of the previous room, compare them with the current one (re-applying that deterministic upgrade process, and aforementioned possible "event upgrade requests"), and determine if any piece of history has been wrongly altered. This tool should be available for anyone to easily check and audit that room's history, the outcome and consequences of a deliberately altered room history should be handled in a social sense afterwards.

Stvad commented 1 year ago

You may find this write-up on schema evolutions interesting when thinking about how to actually do migrations: https://www.inkandswitch.com/cambria/