xmtp / proto

Shared Protocol Buffers and their associated generated code
MIT License
17 stars 5 forks source link

Cursor format that allows clients to change gateways #201

Closed richardhuaaa closed 1 month ago

richardhuaaa commented 1 month ago
  1. Define a 'vector clock' type with clearly defined rules.
  2. Allow a vector clock to be specified as the cursor, for clients that wish to change gateway nodes. Clients will be tracking this information regardless, and this allows us to avoid any heuristics such as a 'lookback period'. The client's first query to the node will specify a vector clock as the cursor, with subsequent queries free to fall back to using the gateway SID again.

A consideration to be made here is whether we should dispense with gateway sids completely, and use the vector clock for all queries. What holds me back is whether we need to worry about the additional message size for queries, and whether the SQL query will be slower (it'll have one OR originator_node_id = ... AND originator_sequence_id > ... for each entry in the vector clock). Getting rid of gateway sids would simplify a lot though.

https://github.com/xmtp/xmtpd/issues/132

neekolas commented 1 month ago

The size of the query payloads doesn't seem that bad to me. The size would be the number of nodes X 64 bits. The only worry is if the list of nodes explodes, with nodes coming and going from the network but old nodes still needing to be queried.

These compound queries are going to be slower, but I don't think we can rule this out without clear data showing the perf is unacceptable. Reads can be run against replicas, so reads are inherently scalable.

If we were to do this I'd be tempted to go all the way and remove originator sequence IDs altogether. That's where we get the maximum simplification and benefits.

richardhuaaa commented 1 month ago

If we were to do this I'd be tempted to go all the way and remove originator sequence IDs altogether

Just double checking, you mean gateway sequence IDs, right?

neekolas commented 1 month ago

you mean gateway sequence IDs, right?

Yes, sorry

Thinking about this a little more, I wonder if vector clocks are actually going to be bigger than I first thought. To completely check a group topic you can't just provide the sequence_ids for the originators who have originated messages in that group already. A new message could come from any originator. It could even come from originators that are currently unhealthy, but previously were healthy and received messages.

Does this mean that we really need to include every node (past and present) in the vector clock to do this kind of query? That might be an unsustainable number.

richardhuaaa commented 1 month ago

Messages can come from new originators, but you don't need to specify those originators in the clock to achieve this (unspecified originators are assumed to be at cursor 0). The query on the server can assume that too, as follows -

...
WHERE
   ...
   AND (originator_node_id NOT IN [originator_1, originator_2]
     OR (originator_node_id = originator_1 AND originator_sequence_id > originator_1_last_seen)
     OR (originator_node_id = originator_2 AND originator_sequence_id > originator_2_last_seen));
neekolas commented 1 month ago

That's a good point

github-actions[bot] commented 1 month ago

:tada: This PR is included in version 3.67.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: