xmtp / libxmtp

LibXMTP is a shared library encapsulating the core functionality of the XMTP messaging protocol, such as cryptography, networking, and language bindings.
MIT License
47 stars 20 forks source link

Proposal: Conversation stitching for DM's #1142

Open richardhuaaa opened 1 month ago

richardhuaaa commented 1 month ago

Problem to solve

So it's unavoidable that we'll have multiple MLS groups under the hood for the same messaging pair (example: race condition where I message you and you message me at the same time). What we need to make sure of is that even if we have multiple MLS groups under the hood, app developers and users will only see one consistent DM. These are our requirements:

  1. When fetching messages from any of the MLS groups associated with a DM conversation, the SDK will respond with messages from all of the groups.
  2. When sending messages on a DM conversation, all installations in that DM will eventually converge to sending them on the same MLS group, even if they originally start off using different ones.

Suggested approach

My suggestion is to add a high-level concept of a dm_id that is separate from the existing low-level concept of a group_id. The dm_id is defined as "dm:{lower_inbox_id}:{higher_inbox_id}".to_lower_case(), and would be a nullable field added to the groups and messages tables here (will need to create a DB migration). Note:

When starting a 1:1 conversation, we should first check for existing MLS groups matching that dm_id (requirement (2)), and if none are found, we would create a new MLS group with the dm_id field set.

Note 1: There is already an existing field on the groups table called dm_inbox_id. I'm suggesting the dm_id field instead because it is globally consistent across all participants in the DM, and the code for computing it is a little simpler. I'm also suggesting writing it to both the groups and messages table. But using the existing field for this is fine also.

Note 2: The dm_id is not privacy-preserving - it's a local-only identifier stored in the user's database, and shouldn't be used on the server anywhere.

For requirement (1)

  1. For 1:1 conversations, apps will never see the group_id for a DM, they will only ever see the dm_id. In other words, in the platform SDK's conversation_id would be defined as dm_id.as_bytes() for DM's and group_id for groups.
  2. When fetching messages for a DM, libxmtp would query the messages table using the dm_id, which will give messages from all of the MLS groups associated with a DM.

For requirement (2)

When sending a message on a DM, the SDK should use the MLS group that is the most recently used. I suggest adding a last_message_ns field to the groups table that is updated on every message send. Then, when sending a DM, find the group_id with the latest last_message_ns for the given dm_id. This is the safest approach for new installations that do not have the message history, and ensures that all installations eventually end up on the same MLS group, letting the older groups fade into obscurity.

Feedback welcome!

neekolas commented 1 month ago

This feels right to me. If we can't avoid duplicates (and we can't) we might as well handle them gracefully and invisibly to the end user.