Feature request: Total ordering of V3 messages

neekolas commented 12 months ago

Is your feature request related to a problem?

The initial approach we were planning to take with MLS messages was to keep track of group epochs in a database table and reject messages sent from older epochs. This is similar to the hacky example delivery service in the OpenMLS repo.

Because the nodes have a limited ability to validate messages (everything but the group ID and epoch are encrypted), this poses an issue. An invalid update could increment the group epoch (even from someone outside the group), causing otherwise valid messages to fail validation.

@bren2010 proposed that we just enforce total ordering of messages on the nodes and keep the nodes unaware of epochs. Clients would be responsible for discarding messages that come from conflicting epochs, and re-sending updates if a previous update ends up being discarded.

Describe the solution to the problem

Even in our current centralized current system, total ordering doesn't come for free. We have multiple nodes with clocks that can drift out of sync, so simply sequencing messages based on server timestamps will fail in some small percentage of cases due to race conditions and clock conflicts.

The goal should be that the client can completely trust that when they receive the latest messages from the nodes, no earlier messages will ever appear in the topic.

My proposal to solve this is pretty simple. When publishing a MLS message, we rely on the database's NOW() function to set the sender_timestamp field in the DB (for MLS messages only). The sender_timestamp is what is already used for sort in our Query and BatchQuery APIs.

It's not 100% fool-proof - you can still get conflicts at the microsecond level, and during a database failover the clock could jump backwards between nodes - but it should be a strong enough guarantee we can build systems that rely on the ordering.

As we decentralize the network we will need to develop more advanced conflict resolution strategies, since this kind of total ordering is not possible in a decentralized network. But we can defer that work until closer to the time of decentralization.

Describe the uses cases for the feature

No response

Additional details

You have to do a little bit of gymnastics to get postgres to give you a nanosecond timestamp (what we already have in our schema). And the best it can give you is microsecond precision multiplied by 1000.

Something like this should work:

((EXTRACT(EPOCH FROM clock_timestamp())::bigint * 1000000000) 
       + (EXTRACT(MICROSECONDS FROM clock_timestamp())::bigint * 1000));

There are fancier methods (logical clocks, serializable transactions that check for newer messages and roll back, using sequences as part of the sort), but I think this should do the job and has a minimal performance penalty.

richardhuaaa commented 12 months ago

we just enforce total ordering of messages on the nodes and keep the nodes unaware of epochs. Clients would be responsible for discarding messages that come from conflicting epochs, and re-sending updates if a previous update ends up being discarded.

LGTM!

When publishing a MLS message, we rely on the database's NOW() function to set the sender_timestamp field in the DB (for MLS messages only). The sender_timestamp is what is already used for sort in our Query and BatchQuery APIs. It's not 100% fool-proof - you can still get conflicts at the microsecond level, and during a database failover the clock could jump backwards between nodes - but it should be a strong enough guarantee we can build systems that rely on the ordering.

Could we use an auto-incrementing type in postgres like SERIAL instead of a timestamp, and sort by that? That removes any possibility of clock skew and guarantees that:

There is a fixed ordering of the messages
A later message cannot be ordered before a previous one

One thing to double-check though is if there are any performance ramifications to this

neekolas commented 12 months ago

Could we use an auto-incrementing type in postgres like SERIAL instead of a timestamp, and sort by that

Definitely an option. I made a passing reference to it at the bottom. My goal was to not make any schema changes to our giant messages table and avoid any performance penalty. But it's a viable option and would give us true total ordering.

richardhuaaa commented 12 months ago

Do we think ordered MLS messages belong in the same table?

In the long-term I suppose we will be replacing all of this anyway, so if usingNOW() is significantly faster to implement it seems like a worthy tradeoff, seeing as it should be fine in the vast majority of cases

neekolas commented 12 months ago

Do we think ordered MLS messages belong in the same table?

I'm hoping so. Putting them in the regular table lets us use all the existing APIs we have for querying messages, which are battle-tested and already implemented in our SDKs.

neekolas commented 12 months ago

After syncing with @richardhuaaa offline I think the plan of attack is:

Implement solution above for now in the new MLS endpoints. We're very unlikely to run into a microsecond time conflict in testing.
Work on a 100% reliable solution using sequences for launch. This will require a larger database migration and some reshuffling of indexes.

snormore commented 9 months ago

Discussed this with @neekolas yesterday and I'm going to pick up the issue to separate MLS messages into the MLS DB rather than including it in the v1/waku messages DB.

Question: Do we also want queries coming in through the messagev3/mlsdelivery API instead of the messagev1 API, with new types defined to support that? That seems a lot cleaner than hacking up the messagev1 Query endpoint to use the MLS DB based on topic prefix, but are there downstream reasons to just continue using the messagev1 API?

snormore commented 9 months ago

@neekolas @richardhuaaa If we need total ordering on messages coming out of subscriptions too, then it seems like doing it via libp2p-pubsub isn't going to be sufficient, and we probably need to poll the DB instead. Does that make sense to you guys?

neekolas commented 9 months ago

I think that's true.

We have a workaround for the lack of total ordering on subscriptions right now, because some messages are safe to read out of order and some aren't. When we hit an unsafe message in the stream we use the Query API to synchronize state in order.

It would be nice to remove the need for that, but isn't a hard requirement.

snormore commented 9 months ago

Ok I'll look at implementing some kind of polling inside the subscribe endpoints. Maybe also using the pubsub messages as a signal so it's not hitting the DB more than it needs to for less active topics.

xmtp / xmtp-node-go