waku-org / pm

Project management, admin, misc
3 stars 1 forks source link

Waku message UID #9

Open LNSD opened 1 year ago

LNSD commented 1 year ago

Goal

Definition of a Waku message uniqueness identifier that can be used to deduplicate Waku messages across the Waku platform.

Proposal

Proposal document: Waku_MessageUID(v2).pdf Inital proposal document Waku_Message_UID.pdf

Related issues:

Implementation issues:

LNSD commented 1 year ago

From discord thread (@jm-clius):

Thanks a lot for this! In general I think replacing the timestamp with some UID is a sensible approach that will allow a globally consistent way for applications to deduplicate their own messages. If I understand your proposal correctly, the chosen UID format will be up to application (as long as they understand the requirements and implications of what they choose). This field then will be "mandatory if Waku Archive functionality is desired", with ordering and deduplication taken care of. Will be interested to hear if Status app thinks this is a feasible solution to the challenges they currently have deduplicating. 🙌

I am also interested in @cammello's opinion about this. And more based on this comment on the RFC issue:

This looks fragile since it's user-set, so how do you handle duplicates (malicious) etc, it becomes a bit meaningless to have a UUID that anyone can set to anything, since you can't make any real decision on it based on uniqueness, and if you do, then timing attacks are possible etc.

@cammellos I am interested in which are the issues you see in having user-set UIDs. Could you elaborate on what kind of "timing attacks" are you thinking of?

cammellos commented 1 year ago

From discord thread (@jm-clius):

Thanks a lot for this! In general I think replacing the timestamp with some UID is a sensible approach that will allow a globally consistent way for applications to deduplicate their own messages. If I understand your proposal correctly, the chosen UID format will be up to application (as long as they understand the requirements and implications of what they choose). This field then will be "mandatory if Waku Archive functionality is desired", with ordering and deduplication taken care of. Will be interested to hear if Status app thinks this is a feasible solution to the challenges they currently have deduplicating. raised_hands

I am also interested in @cammello's opinion about this. And more based on this comment on the RFC issue:

This looks fragile since it's user-set, so how do you handle duplicates (malicious) etc, it becomes a bit meaningless to have a UUID that anyone can set to anything, since you can't make any real decision on it based on uniqueness, and if you do, then timing attacks are possible etc.

@cammellos I am interested in which are the issues you see in having user-set UIDs. Could you elaborate on what kind of "timing attacks" are you thinking of?

This has been addressed by @s1fr0 in his response https://github.com/vacp2p/rfc/issues/563#issuecomment-1379156621

Basically, using a uuid as described initially, would be a public field, and anyone can tap into the network, and push a competing message with the same uuid, potentially causing a dos whereby messages from a peer are constantly raced by a malicious actor. That's assuming deduplication is done solely based on uuid, as described in the initial post.

In the reply it was elaborated that duplication is done through calculating a combination of

The idea is to identify messages with a key derived as messageKey = sha256(messageNametag, payload, [otherWakuMessageFields] )

That would not cause the issue and is a solid solution.

In general, I'd suggest to keep id as something that uniquely identify a message, a uuid would not be as initially proposed, since we can't guarantee uniqueness when information is public and subject to timing attack.

For example, this looks ok as it guarantees uniqueness (replay is possible of course, but that's just in the nature of the protocol and it's actually beneficial :) )

id = sha256(messageNameTag, payload, [otherWakuMessageFields])

but we should not name messageNameTag as uuid as that leads to confusion.

I have also some questions about messageNameTag and selectively pulling data from mailserver, since there's a few exception that I am a bit curious about, but difficult to articulate in writing, https://github.com/waku-org/nwaku/issues/1081 , so in case @s1fr0 has some time as some point, I would not mind asking a few questions (though this is tangentially related).

With regard of the app, messageNameTag solution as proposed would not currently be beneficial as the messages are signed and id is calculated through payload+signature, since equal payload result in the same message and that's something we always handle.

But it's a perfectly fine solution as far as I can see, at the expense of slightly more data, if the benefits of selectively decrypting messages are there, then it would for sure be worth it (this is where I have some questions about, since I can see how it would work for existing sessions, but if it does not work in some instances, we might have to effectively still try to decrypt message in some cases, reducing the effectiveness, but that's a longer conversation :) ).

LNSD commented 1 year ago

Also, from the discord thread (@jm-clius):

One thing to still think about is suitability of such a field for relay/gossipsub deduplication. I understand that there is a low probability for a collision even if applications use different schemes, but I'm not sure that low-layer routing should have such a reliance on application. Applications that fail to populate this field or, for example, continue populating this field with a timestamp, would see their (original) messages drop at random being marked as "duplicate". This would make the gossipsub deduplication also do a form of accidental validation, which may have some unintended consequences.

There is one reason behind the intent of unifying the message uniqueness criteria between Waku Relay and the Waku Archive functionalities and building this idea of "durable streams":

Using the same "global" UID opens the possibility of extending Gossipsub's message cache for long-term message durability. Currently, the message cache is limited to 5 heartbeat windows (configurable but limited).

This is an original idea that @Menduist and I discussed a couple of months ago and has the potential to cover the history synchronization requirements. But I don't have a strong opinion at this moment.

LNSD commented 1 year ago

In the proposal, I suggest that the UID attribute content should be application-specific. As a consequence is up to the application to specify a schema that is "timing attack resistant".

I suggest four example schemas. I advocate for the last of the four options:

  • Application-specific schema (e.g., Sha256 signature, Encrypted meta-info):
  • PRO: Negligible collision probability, contains metadata, non-traceable (looks like random data).
  • CON: Highest complexity, not-so-performant generation (hashing, encryption), might not be sortable at archive query time.

@s1fr0's messageKey idea is a possible schema used by an application for this UID field, and it falls in the fourth category.

LNSD commented 1 year ago

Talking with @s1fr0, I noticed that this conversation could divert into "which UID schema use"? This is different from the objective of the proposal.

The objective is to cover the following use cases:

  • Message deduplication in the network (Waku Relay/Gossipsub).
  • Message deduplication in the Waku Archive backend (e.g., in a shared backend setup).
  • Bandwidth-efficient node history synchronization (based on fetching only UIDs).

And here, the proposal is to provide a new Waku Message attribute that supersedes the timestamp attribute and has the following properties:

Global unique identifier and negligible collision probability

fryorcraken commented 1 year ago

Considering the fact that timing attack could use the initial proposal to censor messages on the network. should this work be moved to Vac - Secure Messaging first?

Cc @oskarth @kaiserd

LNSD commented 1 year ago

Update after a discord conversation with @cammellos:

kaiserd commented 1 year ago

@fryorcraken

Considering the fact that timing attack could use the initial proposal to censor messages on the network. should this work be moved to Vac - Secure Messaging first?

Generally, I'd say yes. But currently, SeM does not have enough ressources :sweat_smile: . (Working on https://github.com/vacp2p/research/issues/154); and message ordering is crucial for achieving the store requirements for the MPV.

Still, I will include this into the SeM roadmap and go through it / provide feedback asap. Regarding attacks against privacy/anonymity, we have to compromise for now if we have to achieve the MVP goal.

kaiserd commented 1 year ago

Note: add label track:data-sync so that the issue can be tracked in the SeM data-sync view on the Vac research project board.

fryorcraken commented 1 year ago

Looking into more details on the original issue. Waku Filter protocol was not mentioned but I assume that one of the side effect of using the unique identifier to prevent duplication of messages over Waku Relay would be the prevention of messages served over Waku Filter too, right?

LNSD commented 1 year ago

For the record:

I am working on formally describing the timing attack described by @cammellos, and coming up with a hybrid UID schema to solve/mitigate that security issue.

LNSD commented 1 year ago

Looking into more details on the original issue. Waku Filter protocol was not mentioned but I assume that one of the side effect of using the unique identifier to prevent duplication of messages over Waku Relay would be the prevention of messages served over Waku Filter too, right?

Yes, if messages are deduplicated at the dissemination network level, there is no need to add this mechanism to the Waku Filter "last-mile delivery" mechanism.

The deduplication in the Waku Archive level is still necessary since we are considering supporting the shared backend (PostgreSQL) use case.

LNSD commented 1 year ago

After reviewing in depth the Gossipsub specification and the principal implementations (Go, Rust and Nim), I have updated the proposal with the feedback received (by @cammellos and @richard-ramos).

The new Message UID (MUID) proposal can be found here: Waku_MessageUID(v2).pdf

NB: This is not an RFC. This is an ADR.

fryorcraken commented 1 year ago

Might have been easier if the PDF was a Markdown file. I think it's fine if we want to have an ADR folder in this repo.


muid: [u8; 64] = concat(checksum, metadata)

The metadata part It is an application-specific part extracted from the Waku Message’s meta attribute.

Why concatenating checksum and metadata if the checksum already uses meta to be generated?

LNSD commented 1 year ago

Why concatenating checksum and metadata if the checksum already used meta to be generated?

The checksum is a signature, and if it matches the ID, it indicates the meta+payload has not been tampered with. For example, a Waku Relay validator could use the concatenated meta field to decide whether a message should be relayed.

LNSD commented 1 year ago

@fryorcraken I tried to push the document in markdown format to this repo, but I couldn't due to this:

git push origin adr-muid
ERROR: Permission to waku-org/pm.git denied to LNSD.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Can I get permission to push to this repo and create a PR?

fryorcraken commented 1 year ago

@LNSD you should have permissions now on the repo :+1:


Waku Relay validator could use the concatenated meta field to decide whether a message should be relayed

I assumed you a specific validation logic in mind here (maybe the Status Communities signature one?). I think a cocnrete example would help me understand. Will wait to see how the validation logic we are defining first will use the ID to better understand.


Looking at solving duplicate messages over Waku Relay.

Waku Relay: deduplication and integrity

Based on the low collision probability of some of the schemas described above, this MUID could be used as the message and seen caches key. A message that reuses the same ID with a different payload within the Message Cache window won’t be relayed. In the same way, if it is replayed within the Seen Cache window, it won’t be received by subscribers. Additionally, as all nodes can compute the checksum part of a message ID, a validator can be integrated to guarantee the Waku Message integrity.

I thought the current issue is that the same message is received twice by the node, at an interval greater than the gossipsub message seen window. Meaning the message is then relayed again.

But this document seems to imply that the current issue is that the seen cache key changes for each message? Or is there no seen cache in nwaku?

fryorcraken commented 1 year ago

https://github.com/waku-org/pm/pull/18 answered my questions :+1: