waku-org / research

Waku Protocol Research
MIT License
3 stars 0 forks source link

RLN rate-limiting over multiple shards #109

Open jm-clius opened 1 month ago

jm-clius commented 1 month ago

Background

Waku currently uses RLN rate-limiting to prevent publishers overwhelming the network with high-rate publishing. The current approach is summarised in the Waku Network specification. Of particular interest is how RLN proofs are validated: instead of slashing rate-limit violators' memberships, the validating RLN Relay nodes simply penalise peers forwarding messages exceeding the rate limit by lowering their gossipsub score (which will eventually lead to disconnection). This incentivises all RLN Relay nodes to validate and drop violating messages or else risk their gossipsub score falling to the point where they lose all connectivty.

Problem

An RLN membership allows a publisher to publish messages to multiple shards. Validating nodes record nullifiers included in the RLN proofs over an epoch and detect rate violators by detecting a repeat of a nullifier (see this description). However, if a validating node is subscribed only to a subset of shards it may not have recorded a previous use of a nullifier on an unsubscribed shard, thereby incorrectly forwarding a violating message as valid. This could lead to its gossipsub score being lowered and its disconnection from the network.

Possible solutions

Option 1: Membership per shard (discarded)

Discarded for not being scalable or economic.

Option 2: Include shard in RLN proof and rate-limit per shard

  1. Include the shard in the RLN signal similar to what was proposed for timestamp in https://github.com/waku-org/nwaku/issues/2972
  2. Update validation rules to cache nullifiers per shard and only consider a message invalid if double-signalling detected in a nullifier on the same shard. In other words, the rate limit remains fixed for a single membership valid on all shards, but it becomes a per-shard rate-limit.

Immediate comments on the above:

  1. On second thought, I'm not exactly sure why (1) is necessary? In the case of the timestamp it makes sense to include it in the proof as the timestamp is a spoofable field in the WakuMessage. Publishers cannot "fake" the shard on which they publish a message, so if the validators want to validate per-shard they can already do so by simply considering the shard on which each message arrive?
  2. Afaiu adding the shard to the RLN signal will not alter the resultant nullifier. In other words, for messages generated by the same membership, in the same epoch and that violate the rate-limit (regardless of destination shard) a double nullifier will exist (on some shard). This means the RLN verification rule as described here will be violated. We can of course decide that these messages are not invalid as they "only" double a nullifier on another shard, a condition we now consider valid. However, is it wise to remove RLN's verification as primary rate-limiting check by adding a conditional validation rule on top? Presumably the publisher will simultaneously generate enough key shares for its secret key to be reconstructed (which would lead to slashing, if we had it implemented).

Option 3: Implement slashing

Remove gossipsub scoring as primary disincentivisation mechanism and implement slashing. If even one validating node is subscribed to all shards used by a rate-violating publisher, the publisher will be slashed and its membership removed.

cc @alrevuelta

SionoiS commented 1 month ago

if a validating node is subscribed only to a subset of shards it may not have recorded a previous use of a nullifier on an unsubscribed shard, thereby incorrectly forwarding a violating message as valid.

That's bad but can be solved by adding the shard to the hash computation of each messages and only allowing shard-specific knowledge for message validation.

This would mean that RLN is no longer a GLOBAL rate-limit is that ok?

Also, option 3 only solve the problem IF it is economically viable to run a node to check ALL messages and slash offenders. It also becomes less and less viable the more shards and traffic increases.

jm-clius commented 1 month ago

Comment from @alrevuelta on Discord:

Not sure "to cache nullifiers per shard" is needed. I mentioned the timestamp as an example, but we already have two examples, the content topic and the message hash. https://github.com/waku-org/nwaku/blob/c5a825e206c1cb3e6e0cf8c01410527804cb76c4/waku/waku_rln_relay/rln_relay.nim#L291-L292

My idea is to bind the message to a given shard. This would also fix the meta problem of replaying attacks across shards, which I think we have. A receiver will check the proof constructing a signal with its local shards (due to consistent hashing, it can be taken from the content topic). If the message was meant to another shard, it wont verify.

I think this fixes both replaying attacks and gives determinism to message validation, since it will no longer depend on the shards a node is subscribed to. As a side effect, it makes the rate limit a shard-rate-limit rather than a global-rate-limit.

Re slashing. Agree it can fix the issue but I never liked the fact that going +1 message over the rate limit can kick you out, especially if we have other punishment options. but just an opinion.

jm-clius commented 1 month ago

Thanks for your comments, @alrevuelta and @SionoiS. I realise there might be something fundamental that I'm missing. 😅

I can see how the message and proof can be tied to a shard by including it in the RLN signal. However, I don't see how this will affect the fact that a publisher is guaranteed to generate a nullifier that has been generated before if it exceeds the shard-agnostic rate-limit. In other words, you are guaranteed to generate a double nullifier if you generate a proof while

  1. using the same sk
  2. over the same epoch
  3. more than rate-limit times

This is true, no matter how you modify the RLN signal. My understanding of the RLN signal is simply to further prove that this nullifier was indeed computed for the unique message and field values hashed into the signal (so that proofs cannot be reused but are tied to specific messages). In other words, a validating node may receive a message that doubles a nullifier on some shard, which in RLN rules should count as a violation. I can see how we can choose to ignore it by determining that these double nullifiers were computed for two messages on two different shards, but this would be the wrong way to go IMO.

SionoiS commented 1 month ago

In my mind the first step for RLN message validation is to check that the hash indeed match the message then you check the zk proof.

in the context of local-only verifiers, they cannot know that a message was sent on other shards BUT they can know that a message was sent multiple time on their shard if we include this information as part of the message hash. This prevents verifiers from spreading invalid messages on their shard.

As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.

As a side note, this here problem is why one should be careful when using a logically centralized system (blockchains) as part of their design, it clash with the other decentralized parts.

jm-clius commented 1 month ago

In my mind the first step for RLN message validation is to check that the hash indeed match the message then you check the zk proof.

Indeed. This is my understanding too.

they can know that a message was sent multiple time

The issue of duplication on a single shard is generally covered by:

  1. short term deduplication in gossipsub (~2 mins)
  2. timestamp validation - messages cannot be more than 20 seconds old compared to local time.

As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.

I think I agree with this statement. Only other alternative I can see is to have membership for each shard, which is the smallest unit that can be monitored by validators.

alrevuelta commented 1 month ago

As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.

Also agree with the first part. That's why I'm suggesting to apply per-shard rate limits so that "everyone can be in the same page".

Agree also that with slashing this will be fixed, but unsure if the other things slashing introduce are okay.

short term deduplication in gossipsub (~2 mins)

But this protects within the shard. A cross-shard replay attack wont be protected by the gossipsubs cache?

SionoiS commented 1 month ago

they can know that a message was sent multiple time

The issue of duplication on a single shard is generally covered by:

  1. short term deduplication in gossipsub (~2 mins)
  2. timestamp validation - messages cannot be more than 20 seconds old compared to local time.

Sorry, poor wording, I meant multiple in the context of RLN AKA spam.

We also should be careful because "the same message" is dictated by what's included in the hash. If we add the shard to the hash "the same message" cannot be sent on multiple shards anymore, which is not true ATM.

Case 1

no shard in hash

Msg is valid on all shard

Case 2

shard included in hash

Msg is valid only on shard 4

jm-clius commented 1 month ago

Right! Thanks for all the clarifications. I think we have identified two issues here: (1) is related to duplicating messages across shards and (2) about validators with "partial shards view" when the same publisher (sk) uses more than one shard in an epoch

Afaics the only solution for (2) would be slashing. We can continue tracking this problem in this issue. For (1), encoding the shard into the signal/hash (and validating) would solve the issue. I've created a separate issue to track this: https://github.com/waku-org/research/issues/110