Closed richard-ramos closed 1 year ago
- Generate a UUID. In the WakuMessage we could add an optional UUID attribute. This is generated by the author of the WakuMessage and should be inserted in the storenode database along with the other fields. An advantage of having the UUID field defined (for waku users in general, not only for status) could be that if no Timestamp is defined, we could require that an UUID value to be defined, and that way we ensure that there is no duplicated messages stored in the database. @s1fr0 can expand on this idea as he is the original author.
This looks fragile since it's user-set, so how do you handle duplicates (malicious) etc, it becomes a bit meaningless to have a UUID that anyone can set to anything, since you can't make any real decision on it based on uniqueness, and if you do, then timing attacks are possible etc.
3 sounds like the safest solution, 1 is also a solution that has definitely been used before, and if we don't expect the protocol to change much, then it's most likely safe
We ran into one of the issues described above: In status-go when I build the message, it has this format (I'm using json just so it's easy to read)
{
"payload":"some_payload",
"contentTopic":"90b65333bbb741b063375c346d27b4cc67ca7006",
"timestamp":1671558324044743455,
"version":2
}
However when I attempted to retrieve the message from the store node, the WakuMessage arrived with this format
{
"payload":"some_payload",
"contentTopic":"90b65333bbb741b063375c346d27b4cc67ca7006",
"timestamp":1671558324044743455,
"rate_limit_proof":{} <--
}
Notice how there's a rate_limit_proof
field included in the response. Since I'm calculating the ID based on the marshalled protobuffer, the ID for the retrieved message will not match the ID for the message I sent.
Thanks, @richard-ramos, for such an elaborate problem description and solutions 💯
Let me list here the actions items that I extract:
Once the message ID RFC work starts, I think that we should move the discussion to the WAKU-MESSAGE-ID RFC PR.
This looks fragile since it's user-set, so how do you handle duplicates (malicious) etc, it becomes a bit meaningless to have a UUID that anyone can set to anything, since you can't make any real decision on it based on uniqueness, and if you do, then timing attacks are possible etc.
Sorry for replying so late. I'll try to elaborate a bit more on 2.
The idea is to rename timestamp
to uuid
or tag
or use the same name we used in the Noise pairing spec, i.e. messageNametag
.
The need for a random messageNametag
in addition to the payload
is due to the fact we can have cases where the same payload/message is sent simultaneously in the same topic (e.g. two users that say "hello" at the same time).
messageNametag
will contain random data either set by the user or deterministically generated by sender/receiver (in order to identify received encrypted messages without doing trial decryption - more context in https://github.com/waku-org/nwaku/issues/1081).
The idea is to identify messages with a key derived as messageKey = sha256(messageNametag, payload, [otherWakuMessageFields] )
. Then:
messageNametag
but different payloads
, their messageKey
will differ and will be correctly stored as distinct messages. However, since it would be unfeasible to randomly generate the same random bytes independently, this could raise a warning that the user is forging the tag/not following the protocol.payload
(and eventually same otherWakuMessageFields
) but different messageNametag
, their messageKey
will differ and will be correctly stored as distinct messages;payload
(and eventually same otherWakuMessageFields
) and messageNametag
, their messageKey
will be equal and the last arrived message will be correctly discarded. Note that there is no information loss, since an exact copy of the message was previously stored.Note that one of the benefits of this solution is that is already implemented in Noise (requires some minor tweaks). Indeed, there, the field messageNametag
enters the decryption routine for payload
as Additional Data, implying that the messageNametag
field cannot be changed without invalidating the decryption. In other words, the field messageNametag
is authenticated and its integrity can be verified by the recipient during decryption (but not by store nodes).
In the unauthenticated case, instead, an attacker in-the-middle could change the tag and the payload to whatever it wants, but this is expected when cryptography is not employed. Collisions in the messageNametag
field will happen only if users will not follow the protocol, but in any case no message will be lost (except identical copies) even if the tag/payload are set maliciously.
@fryorcraken I'd suggest we close this issue as part of the 10K epic, given that deterministic message hashing solved the immediate requirement. Future plans to include a Message UID in messages, also for distributed store sync, is tracked in https://github.com/waku-org/pm/issues/9
Let's close. MUID logic has been defined and implemented and used in store implementation to remove dupe messages. Future work is tracked.
In the status-go we need an unique identifier for each
WakuMessage
we receive, to determine whether we have seen this message before or not to attempt to decrypt it and store it into the DB. Currently it's being calculated as this:This ends up generating a 32 bytes hash that could be used as an ID, however, @cammellos pointed out a fundamental flaw with this approach. Protobuffer serialization is not deterministic (It's not part of the protocol):
In WakuV1 envelopes (aka the protocol messages), this is not a problem since the IDs are calculated only once after the message is dispatched, and if we need to retrieve this message from a mailserver, we can be sure that the ID will be the same because the mailserver returns the raw bytes of the envelope it received, so calculating the sha256 over these raw bytes will generate the same hash obtained when sending the messages.
In WakuV2 this can be a problem, since the hash of the protobuffer at the moment of sending the message might not match the hash of the protobuffer returned by the storenodes, which is a real possibility as we have already seen before differences between Nim / JS and Go protobuffer encoding.
To solve this issue I see some approaches we could follow, and would like your feedback on these and additional ideas or solutions
1. Calculate the ID based on the content of the message:
This is the easiest solution. It could make the protocol not upgradable, in case a field is added or removed. However, this might not be a big problem as I wouldn't expect the
WakuMessage
protobuffer to get rid of these fields anytime soon, but it is still a possibility to take into consideration2. Generate a UUID. In the
WakuMessage
we could add an optionalUUID
attribute. This is generated by the author of theWakuMessage
and should be inserted in the storenode database along with the other fields. An advantage of having theUUID
field defined (for waku users in general, not only for status) could be that if noTimestamp
is defined, we could require that anUUID
value to be defined, and that way we ensure that there is no duplicated messages stored in the database. @s1fr0 can expand on this idea as he is the original author.2. Have the store, lightpush and filter protocol use raw message bytes instead Currently our protocols handles
WakuMessages
like this:WakuMessage
object, and then inserts the attributes of this message into the database. When a client does a history request, the storenode executes DB query, and createsWakuMessage
instances for each of records returned by the query, and send them back as a response.WakuMessage
, then create aMessagePush
with thisWakuMessage
before sending it to the subscribersWakuMessage
before broadcasting it via relay.Since in all these protocols we are doing unmarshalling of the serialized
WakuMessage
, we are potentially creating different results when we marshal them again, either due to difference between protobuffer encoding across programming languages, or because theWakuMessage
protobuffer version might be different across nodes. This happens with RLN: We are not storing the fields related to the RLN proof in the database; and when using filter full nodes or lightpush nodes that do not have RLN enabled, they will not forward the proof of the WakuMessages to the subscribers, or broadcast it via relay. This is a real problem we saw while working with js-rln when some nodes were running zerokit and others were running kilic/rln (@fryorcraken and @s1fr0 probably remember this case).I believe that nodes should not modify the WakuMessages: the bytes of a message should be the same across all peers, and the way the protocols work right now allow the messages to be modified either by ignoring attributes due to difference of version / features, or due to marshalling. With this goal in mind, I propose the following solution:
In Store protocol:
payload
column that stores only thepayload
attribute of theWakuMessage
, to instead have a BLOBdata
column which would store the full raw bytes of theWakuMessages
as they are received in the storenode without them being decoded. This has a cost in space of course.WakuMessages
, we returnbytes
, which would be obtained from the newdata
column.This also has the advantage of moving the marshalling responsibility from the storenodes to the clients. This has been identified as a bottleneck in WakuV1 before (See here), and I'd expect the same to happen with V2, so with this change there should be a performance improvement
In Filter protocol:
waku_message
in theMessagePush
protobuffer also tobytes
. This will contain the raw waku message as it is received in relay, without decoding it.In Lightpush protocol:
message
in thePushRequest
protobuffer also tobytes
. This will contain the raw waku message as it is received in the lightpush request, without decoding it.Any of these solutions could work for the problem we have in status-go, as it would be able to deterministically calculate the message ID, however I believe that solution 3 should be implemented, specially now that newer versions of Store https://github.com/vacp2p/waku/pull/10/ and Filter protocol https://github.com/vacp2p/rfc/pull/562 are being worked on, since doing these changes mean that we shouldn't have to worry about nodes having different versions of
WakuMessage
protobuffer that could introduce or remove attributes, since we'd be handling the raw bytes of the message.cc: @fryorcraken, @jm-clius, @LNSD, @cammellos