bug: timestamp should be set to 0 if not provided in REST API

kaichaosun commented 3 weeks ago

Problem

I sent messages through /relay/v1/auto/messages, and didn't set timestamp, but a auto generated timestamp is stored in database messages table. It's probably better to set it to 0 based on spec.

Impact

The timestamp affect the hash of wakumessage, for scenarios like re-send old messages, a auto-generated timestamp may affect the application level solution which depends on the store v3 hash-based query.

Expected behavior

set timestamp to 0 if not provided.

chaitanyaprem commented 3 weeks ago

IIRC, the idea was to make timestamp a mandatory field in The Waku Network because of RLN. Which is why if a timestamp is not passed by the caller, the API would generate and include the timestamp in the message. This would also make Store queries more deterministic as all messages are stored based on originating timestamp.

The timestamp affect the hash of wakumessage, for scenarios like re-send old messages, a auto-generated timestamp may affect the application level solution which depends on the store v3 hash-based query.

Did not understand how the timestamp being filled automatically will affect application. Because the messageHash is generated at Waku REST API layer and returned to the application. If application has to resend a message, it will again pass timestamp as 0 and will get a new hash. So, it can always only store the latest hash of the message which it can then use for store query. Maybe you can elaborate more on the scenario you are talking about so that i can understand.

kaichaosun commented 3 weeks ago

If application has to resend a message, it will again pass timestamp as 0 and will get a new hash. So, it can always only store the latest hash of the message which it can then use for store query.

If a new hash is used, app can't use the old one to query the message. I think we need to explicitly spec how timestamp of WakuMessage is used, and change nwaku accordingly.

@jm-clius has some great points during our discussion,

Kaichao:

Hi Hanno, I remember you mentioned there is timestamp validator for WakuMessage, I can't find it in nwaku, not sure where I missed or is it still being planed?

I'm thinking about this, there maybe edge cases that messages are missing in store, and users want to re-send the old messages, from current spec the hash of wakumessage depends on timestamp, so the replayed message should not update timestamp. It seems we can remove timestamp in the current message spec and let application spec to handle it.

Hanno:

Hey Kaichao, indeed in nwaku this is not strictly validated yet on the relay layer, but this is specified for network usage so we plan on adding this timestamp validation. In nwaku we do however already validate the timestamp before inserting a message into the store: https://github.com/waku-org/nwaku/blob/7f8d8e806c16d5070edbc8dbe9f724b56c955c08/waku/waku_archive/archive.nim#L53-L72

Hanno:

The WakuMessage timestamp is meant to be the "routed at" timestamp. It is not meant primarily for application-level ordering (in other words, we expect applications to have their own sorting order/timestamps within the encrypted layer which may be slightly different from when they routed the message on Waku, compensate for app-level retransmissions, etc.) It's important that Waku remains a real-time routing network, although we do give up to 20s time for variance in the timestamp, during which the app could potentially retransmit or transmit duplicates or use a variety of strategies to ensure a message was indeed published. During this time the timestamp doesn't have to change, as long as it remains roughly within 20s of the current time. The Waku Store is meant to simply be a log of what was relayed in real time. If a message is missing from history it should mean that that message wasn't relayed during that time. Timestamp validation was included specifically to prevent relayers from modifying history or changing our assumptions about how the Store should be partitioned. Now, if there are edge cases where a message was relayed but is somehow missing from a Store node and an application wants to modify "relay" history (not app history, mind you), perhaps we can look at that as a separate problem related to Store sync. However, these "modify history" updates should not be done over Relay that is a real time protocol - Store Sync is one option, but we may come up with other solutions too.

Kaichao:

how the Store should be partitioned.

Store has a storedat field, isn't this better for the partition (do you mean paging here)?

I'm thinking about this scenario, if app find specific message is missing, the app somehow gets an identifier to the missing message, the identifier could come from a merkle tree or new messages. To retrieve the missing message with the given identifier, it either go to store or send a request to ask for the messages who still has it. If using store, the identifier mush be hash of WakuMessage, and if failed with this store query, then user has to request for resend, and now the message hash will be change if using a new timestamp for resend. Quick solution is make the hash not depend on this relay timestamp, in this way it seems the timestamp verification is like hide one's ears and steal the bell. If still depend timestamp in hash calculation, it looks much harder to handle the resend scenarios, adding another identifier besides message hash in store?

The relay protocol is helping messages reach from one point to another, can you elaborate on the value to maintain the relay history? "real-time" imo is in user experience, how such concept can help a network routing layer, in my understanding the routing is about prevent spam, in time publish, and prevent duplicate processing. Keen to know what's I missed in understanding the protocol in deep level. 😃

With store sync, this edge case should not be a problem at all, it may helps with history messages (already deleted by store) retrieval though, or maybe we want a total different solution for such use case.

Hanno:

Well, I mean both paging and partitioning. Indeed, Store uses storedAt for consistent ordering of messaging, but storedAt is always just the timestamp from the message (named storedAt because at some point in the past we allowed the Store to provide its own timestamps, but that really disallowed proper deduplication)

if app find specific message is missing

Ah, but this mixes app-level reliability with Waku routing layer reliability. If an app finds that a message ID is missing and this message is missing from the Waku Store layer, that would mean that the message was never relayed in the network. It shouldn't be up to the app to be able to change the history of what was relayed on this level - instead the app should craft a new Waku Message with a new RLN proof and routed-at timestamp, which it broadcasts to everyone in the Relay network. Of course, in a scenario where a Store node is missing a message for some reason that was relayed, that becomes a Store Sync problem and should be solved on that layer.

the value to maintain the relay history

Well, applications use Waku as a real-time (ish) ephemeral messaging layer. The point is that some nodes may be offline for a while and want to "resume" their real-time processing from some point in the past after an offline period. Store generally provides the ability to catch-up from a cache of these messages, with some advanced functionality like filtering and query-criteria included to increase the value add of this service.

chaitanyaprem commented 2 weeks ago

If a new hash is used, app can't use the old one to query the message.

If you see status-go (https://github.com/status-im/status-go/blob/11dc6976bdadd9c80ddcbc6897e647c5246b7ab7/wakuv2/waku.go#L1041) , every time a message is retransmitted, the timestamp of message is changed which results in a new hash.

Also note, status-go uses something called as envelopeID to uniquely identify application payload, different that WakuMessagehash.

Thanks for sharing the above interesting discussion. After going through, i have few comments or questions.

If WakuMessage.timestamp is not to be used by applications at all and is meant to indicate routed-at for a message, then wouldn't it be better if it is always filled/set by Waku relay rather than getting it from application? This would remove confusion for applications as it will always be filled/set by Waku relay.
I had noticed that for RLN proof generation, msgTimestamp is not used rather currentTime during proofGeneration. Similar approach can be taken for WakuMessage.timestamp where it is always filled in relay. This would ensure timestamp is generated by relay layer only and used for Store, hash and RLN (which are all Waku functionlities).

jm-clius commented 2 weeks ago

Deserves some consideration - it could be a valid model to have the API always set the timestamp. However, I do think applications sometimes prefer control over this timestamp (as it's the one that is queryable via Store). For example, if this external Waku timestamp matches the internal message timestamp, certain assumptions become easier for querying use cases. The restriction that it should closely match the publish (or routed-at) time is not generally an issue outside of app-level retransmissions or reordering. For interest's sake, does status-go set this Waku timestamp explicitly and is it used for some message ordering/querying on the application level?

chaitanyaprem commented 2 weeks ago

For interest's sake, does status-go set this Waku timestamp explicitly and is it used for some message ordering/querying on the application level?

Yes, status-go sets this timestamp explicitly. IIRC the timestamp is not used for any message ordering within status-go.

I do think applications sometimes prefer control over this timestamp (as it's the one that is queryable via Store).

You mean for range based store queries. That is something i have forgotten about but is a very valid case where apps would want to use the message timestamp.

waku-org / nwaku