Open jm-clius opened 1 month ago
if a validating node is subscribed only to a subset of shards it may not have recorded a previous use of a nullifier on an unsubscribed shard, thereby incorrectly forwarding a violating message as valid.
That's bad but can be solved by adding the shard to the hash computation of each messages and only allowing shard-specific knowledge for message validation.
This would mean that RLN is no longer a GLOBAL rate-limit is that ok?
Also, option 3 only solve the problem IF it is economically viable to run a node to check ALL messages and slash offenders. It also becomes less and less viable the more shards and traffic increases.
Comment from @alrevuelta on Discord:
Not sure "to cache nullifiers per shard" is needed. I mentioned the timestamp as an example, but we already have two examples, the content topic and the message hash. https://github.com/waku-org/nwaku/blob/c5a825e206c1cb3e6e0cf8c01410527804cb76c4/waku/waku_rln_relay/rln_relay.nim#L291-L292
My idea is to bind the message to a given shard. This would also fix the meta problem of replaying attacks across shards, which I think we have. A receiver will check the proof constructing a signal with its local shards (due to consistent hashing, it can be taken from the content topic). If the message was meant to another shard, it wont verify.
I think this fixes both replaying attacks and gives determinism to message validation, since it will no longer depend on the shards a node is subscribed to. As a side effect, it makes the rate limit a shard-rate-limit rather than a global-rate-limit.
Re slashing. Agree it can fix the issue but I never liked the fact that going +1 message over the rate limit can kick you out, especially if we have other punishment options. but just an opinion.
Thanks for your comments, @alrevuelta and @SionoiS. I realise there might be something fundamental that I'm missing. 😅
I can see how the message and proof can be tied to a shard by including it in the RLN signal. However, I don't see how this will affect the fact that a publisher is guaranteed to generate a nullifier that has been generated before if it exceeds the shard-agnostic rate-limit. In other words, you are guaranteed to generate a double nullifier if you generate a proof while
sk
epoch
rate-limit
timesThis is true, no matter how you modify the RLN signal. My understanding of the RLN signal is simply to further prove that this nullifier was indeed computed for the unique message and field values hashed into the signal (so that proofs cannot be reused but are tied to specific messages). In other words, a validating node may receive a message that doubles a nullifier on some shard, which in RLN rules should count as a violation. I can see how we can choose to ignore it by determining that these double nullifiers were computed for two messages on two different shards, but this would be the wrong way to go IMO.
In my mind the first step for RLN message validation is to check that the hash indeed match the message then you check the zk proof.
in the context of local-only verifiers, they cannot know that a message was sent on other shards BUT they can know that a message was sent multiple time on their shard if we include this information as part of the message hash. This prevents verifiers from spreading invalid messages on their shard.
As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.
As a side note, this here problem is why one should be careful when using a logically centralized system (blockchains) as part of their design, it clash with the other decentralized parts.
In my mind the first step for RLN message validation is to check that the hash indeed match the message then you check the zk proof.
Indeed. This is my understanding too.
they can know that a message was sent multiple time
The issue of duplication on a single shard is generally covered by:
As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.
I think I agree with this statement. Only other alternative I can see is to have membership for each shard, which is the smallest unit that can be monitored by validators.
As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.
Also agree with the first part. That's why I'm suggesting to apply per-shard rate limits so that "everyone can be in the same page".
Agree also that with slashing this will be fixed, but unsure if the other things slashing introduce are okay.
short term deduplication in gossipsub (~2 mins)
But this protects within the shard. A cross-shard replay attack wont be protected by the gossipsubs cache?
they can know that a message was sent multiple time
The issue of duplication on a single shard is generally covered by:
- short term deduplication in gossipsub (~2 mins)
- timestamp validation - messages cannot be more than 20 seconds old compared to local time.
Sorry, poor wording, I meant multiple in the context of RLN AKA spam.
We also should be careful because "the same message" is dictated by what's included in the hash. If we add the shard to the hash "the same message" cannot be sent on multiple shards anymore, which is not true ATM.
no shard in hash
Msg is valid on all shard
shard included in hash
Msg is valid only on shard 4
Right! Thanks for all the clarifications. I think we have identified two issues here:
(1) is related to duplicating messages across shards and
(2) about validators with "partial shards view" when the same publisher (sk
) uses more than one shard in an epoch
Afaics the only solution for (2) would be slashing. We can continue tracking this problem in this issue. For (1), encoding the shard into the signal/hash (and validating) would solve the issue. I've created a separate issue to track this: https://github.com/waku-org/research/issues/110
Background
Waku currently uses RLN rate-limiting to prevent publishers overwhelming the network with high-rate publishing. The current approach is summarised in the Waku Network specification. Of particular interest is how RLN proofs are validated: instead of slashing rate-limit violators' memberships, the validating RLN Relay nodes simply penalise peers forwarding messages exceeding the rate limit by lowering their gossipsub score (which will eventually lead to disconnection). This incentivises all RLN Relay nodes to validate and drop violating messages or else risk their gossipsub score falling to the point where they lose all connectivty.
Problem
An RLN membership allows a publisher to publish messages to multiple shards. Validating nodes record nullifiers included in the RLN proofs over an epoch and detect rate violators by detecting a repeat of a nullifier (see this description). However, if a validating node is subscribed only to a subset of shards it may not have recorded a previous use of a nullifier on an unsubscribed shard, thereby incorrectly forwarding a violating message as valid. This could lead to its gossipsub score being lowered and its disconnection from the network.
Possible solutions
Option 1: Membership per shard (discarded)
Discarded for not being scalable or economic.
Option 2: Include shard in RLN proof and rate-limit per shard
timestamp
in https://github.com/waku-org/nwaku/issues/2972Immediate comments on the above:
timestamp
it makes sense to include it in the proof as thetimestamp
is a spoofable field in theWakuMessage
. Publishers cannot "fake" the shard on which they publish a message, so if the validators want to validate per-shard they can already do so by simply considering the shard on which each message arrive?Option 3: Implement slashing
Remove gossipsub scoring as primary disincentivisation mechanism and implement slashing. If even one validating node is subscribed to all shards used by a rate-violating publisher, the publisher will be slashed and its membership removed.
cc @alrevuelta