Open alrevuelta opened 2 months ago
Just to state the obvious, this assumes that we collect fees otherwise it can't work long term.
we need some kind of pseudo-identity to sybil-protect the voting
Do you think it would be economically viable to register 2/3 of ALL possible memberships and control the vote to get all the fees?
A protection shall be added to discard "lazy nodes", meaning that they don't participate in relay but manage to get a valid Merkle root.
Waku Sync could be used to get the correct Merkle root without using Relay at all. I'm not sure we can prevent that. :thinking:
a simple 2/3 consensus of a supermajority of 66% is used to decide which root to accept.
What happen when no majority? Do fees just accumulate?
Note that a node operator that does not vote during SUPEREPOCHS_PENALTY can be slashed by anyone
To me it feels unnecessary, is the lack of reward not penalty enough? It would allow superepoch to not have votes and for rewards to accumulate. It would be good to have a mechanism that can allow skipping superepoch when the fees are not high enough but maybe it makes the system too gameable?
Maybe instead of voting, let people gamble on the correct merkle root? Stake/Vote -> Bet, Reward -> odds of wining. No idea how to determine the truth though... :thinking: :shrug:
Do you think it would be economically viable to register 2/3 of ALL possible memberships and control the vote to get all the fees?
This is one of the issues that "1. Node registration" should address. In Ethereum this was solved by requiring a minimum amount of operators to stake first. Without this minimum amount of operators MIN_GENESIS_ACTIVE_VALIDATOR_COUNT
, the protocol could not launch. Since the participation was open to anyone, and the stake was considerable, the diversity was quite good, but at some point an entity controlled around 30%.
Waku Sync could be used to get the correct Merkle root without using Relay at all. I'm not sure we can prevent that. 🤔
Perhaps with a rate limit. Its okay to get some messages via waku sync (eg if you are offline for some time) but it shouldn't be possible to purely rely on waku sync. Another option (largely vague) is to mix service incentivization here. Meaning store sync will only work if you pay a small fee (see service incentivization).
What happen when no majority? Do fees just accumulate?
No majority can happen in two scenarios: i) nodes strongly disagreeing in the root or ii) a bunch of nodes not voting. For ii) we can heavily penalize that since that hinders consensus. For i) tbh to be defined.
To me it feels unnecessary, is the lack of reward not penalty enough? It would allow superepoch to not have votes and for rewards to accumulate.
I would say penalizing here mitigates the ii) from the previous point.
Looks good to me. Very exciting.
Yes, there are some unknown and risks. Let's do the work so we can confirm viability (or not) of such a protocol.
One potential issue is ephemeral messages. They should probably be discarded from the Merkle tree as it is not possible to use store/store sync to double check none were missed
Few thoughts regarding gaming this by relying on store sync:
If the superepoch
is greater than the retention of most store, then it would not make it possible for someone to step in and back track enough to get the merkle root.
Hence, one would need to use store sync on a regular basis to sync all messages (instead of using RLN Relay). I wonder if purely by having some sane rate limits on store, this can be prevented.
Also, I believe store currently does not store RLN proofs. So there may be some potential issue here, when doing store sync, how can you be certain that the messages you are given are legitimate?
Maybe getting the RLN proof with store queries could be a more expensive service, app users should not need that, but those participating in this protocol would want to be sure the message should be included.
I like the overall approach as it resembles very closely of the pattern that is being used in previous project that i worked on. I am wondering if we could leverage and integrate with that project (or reuse modules/research already done) which might help us as they are solving/have solved similar problems.
Few observations.
In other words, if 10 users paid an RLN membership for a given superepoch then all this generated fees are distributes among the nodes operators that participated in this superepoch.
Wondering if pro-rating of rewards would be requried as node operators can join the network at any point in time.
For each
superepoch
node operators construct a Merkle tree with all the messages they have:
Since in relay network message ordering is not gauranteed, how would this merkle tree look like as the root would keep changing based on order of insertion. Instead of a merkle tree, could we instead use sync state datastructure which is used by RBSR (as it is resilient to ordering)
As explained before, node operators are registered beforehand in the contract a priori so the total amount of votes are known.
this essentially means total votes to be considered for counting and majority would change per superepoch
right.
- Hence, one would need to use store sync on a regular basis to sync all messages (instead of using RLN Relay). I wonder if purely by having some sane rate limits on store, this can be prevented.
We could not use the hash of the actual message for sync and instead use a different hash that is not used by the consensus mechanism.
@fryorcraken
One potential issue is ephemeral messages. They should probably be discarded from the Merkle tree as it is not possible to use store/store sync to double check none were missed
Good catch. Yep, they should be discarded from the Merkle tree.
@chaitanyaprem
Powerloom
Will check, interesting.
Wondering if pro-rating of rewards would be requried as node operators can join the network at any point in time.
Haven't thought much about it, but most likely yes. I'm afraid though this calculations may consume too much gas. Ideally billing periods should go together with the superepoch
but if someone registers the last day before the superepoch finishes, what happens? Does the user has to pay for the full period? No. Does the user pay for just 1 day and then the next superepoch(s) will match the billing period? Maybe yes.
Since in relay network message ordering is not gauranteed, how would this merkle tree look like as the root would keep changing based on order of insertion. Instead of a merkle tree, could we instead use sync state datastructure which is used by RBSR (as it is resilient to ordering)
I was thinking about ordering the leafs according to some criteria. Eg by timestamp and by hash (alphabetically). In that case the order would be deterministic and the root as well. You may receive an out of order message that you can insert in between wherever is its place. Does the datastructure you are referring to contain some kind of representation of all the content (eg like merkle root)?
this essentially means total votes to be considered for counting and majority would change per superepoch right.
Yep. New operators can be added (+) or operators can be slashed (-) or leave (-). Slightly related, Ethereum has a queue mechanism so that even if you have lots of capital to deploy (spawn new operators) you can't do it instantly. You have to wait an entry (and exit) queue. Its a protection.
I was thinking about ordering the leafs according to some criteria. Eg by timestamp and by hash (alphabetically). In that case the order would be deterministic and the root as well. You may receive an out of order message that you can insert in between wherever is its place.
That's basically what Prolly trees are.
I have been giving it more thought on fee distribution to operators and the approach we are thinking of above. Dropping some more thoughts here, not sure if we need to have solution to start with or some of these should be thought at a later stage but writing them anyway.
Waku's aim is to build a p2p network where users and operators can operate in adversarial conditions (different than other p2p networks like ETH where currently they are currently designed for ideal conditions i.e node always online, not censorship resistent etc). This means that the conditions are not ideal either for users or node operators(i.e network maybe always available but nodes may not be due to various reasons which means it may not be able to function at 100% all the time).
By designing a reward system for operators who validate and relay all the messages that pass through the network, we are expecting all operators to be available 100% of the time relaying messages. If we take this approach, will we be end up hurting the operators who run the nodes from home where stability is not as guaranteed as nodes running in cloud service provider. Will we be forcing operators away from decentralization as this would push operators to run node on cloud providers only?
Should we consider an approach where operators are rewarded based on percentage of messages they were able to relay? I am not saying we should reward even if someone relays only 1% of messages, but rather than expecting 100% relaying should we look at some approach where incentives are given in other manner. Again this goes against my point above, because by incentivizing in this model we may be pushing operators towards utilizing a more centralized infra.
Alternatively how about a mechanism which randomly verifies if an operator is validating and relaying messages? Maybe similar to a mechanism which filecoin network uses for Proof-of-SpaceTime. In Proof-of-SpaceTime, random checks are done for parts of data that are expected to be stored at frequent intervals which prove that data is still being stored rather than checking for all the data that is stored all the time. Can we think of a mechanism on similar lines that can just verify that a node is validating and relaying messages by being able to do that with some sort of a part verification rather than verifying if the node has relayed/validated all messages in the network?
Thanks for the issue write-up!
I think your "open problems" summarise what our next research focus should be. In particular, a potential no-go would be if we can't find an elegant solution for:
Handle lazy nodes, defined as entities not participating in relay but somehow getting a valid Merkle root.
A node wouldn't even need to use Sync to be able to track the Merkle root without actually contributing - you could simply run as a relay "sink" (with a few in connections and no out) without actually relaying messages or providing other services. I like the direction that @chaitanyaprem suggests - random checks to verify that a node is actually contributing to the network. It may help here to consider Sync as an integral part of the routing fabric of the network. In other words, we may expect operators participating in the fee rewards to run the trifecta of RLN + Relay + Sync - at the very least they would have to respond to Sync requests (as it may be difficult to "prove" that a node is actually relaying and not just storing messages).
Should we consider an approach where operators are rewarded based on percentage of messages they were able to relay?
Presumably, if we consider Sync an essential element here, we could expect of nodes to know about all messages. Perhaps a random check to show that expected messages are stored combined with a Merkle proof?
@chaitanyaprem
different than other p2p networks like ETH where currently they are currently designed for ideal conditions i.e node always online, not censorship resistent etc
Note that eth is designed for very adversarial environments and nodes are not expected to be online. The incentives make them stay online, but the network can handle eg 50% nodes offline. Some non-finality period will start and penalties will kick out nodes that are not attesting.
By designing a reward system for operators who validate and relay all the messages that pass through the network, we are expecting all operators to be available 100% of the time relaying messages.
Good point. I would say its not 100% whats required, since you can be 90% up and get what you are missing via store sync (eg restart for an update). But I get the point.
Should we consider an approach where operators are rewarded based on percentage of messages they were able to relay?
I'm open for this, but not sure how it can be technically done. How can you prove that you have relayed "some" messages? I think hopr has something related to this, proof of relay.
Alternatively how about a mechanism which randomly verifies if an operator is validating and relaying messages? Maybe similar to a mechanism which filecoin network uses for Proof-of-SpaceTime
The random checks idea is interesting, will explore.
@jm-clius
A node wouldn't even need to use Sync to be able to track the Merkle root without actually contributing - you could simply run as a relay "sink" (with a few in connections and no out) without actually relaying messages or providing other services
Aren't we protected with gossipsub scoring from this?
I like the direction that @chaitanyaprem suggests - random checks to verify that a node is actually contributing to the network
Sounds interesting. Any idea on which challenge to randomly run? imho the hard thing here is that if the challenge implies "being right on something" that's tricky, since we don't have consensus, hence its not possible to verify it correct.
Note that eth is designed for very adversarial environments and nodes are not expected to be online. The incentives make them stay online, but the network can handle eg 50% nodes offline. Some non-finality period will start and penalties will kick out nodes that are not attesting.
right, doesn't that mean validators are expected to be always online i.e 100%?
I'm open for this, but not sure how it can be technically done. How can you prove that you have relayed "some" messages? I think hopr has something related to this, proof of relay.
proof of relay was interesting read, Thanks!
Instead, since gossipsub already has very detailed scoring mechanism that penalizes nodes for non-effective participation when they are online, i was wondering if we can somehow leverage that. Will think more on this problem though, this is just an initial thought.
What if each node somehow indicates based on gossipsub score that a peer was available and relaying messages effectively. And if such indications are received from some n
number of nodes for m
times in a week/superepoch or some other duration
we can think of, can we consider that node was able to do its job of relaying(maybe not 100%, but at some degree - was thinking some sort of thresholds to scores can be used in determining this)?
This also acts as an indirect random check for the node.
Maybe this somehow can also be added to the node's reputation apart from service protocol specific reputation? wdyt @s-tikhomirov
I think hopr has something related to this, proof of relay.
From what I understand, HOPR is similar to Lightning in that it uses onion routing and hashlocks to make payment conditional on delivery of data. AFAIU, this scheme is suitable for a point-to-point communication, where first a path is established, and then the message is onion-encrypted using public keys of nodes along the path. Our use case, in contrast, has a broadcast communication pattern, therefore I'm not sure proof-of-relay is well-suited here. I'd be happy to be proven wrong of course.
I've also been preparing a more detailed reply to this discussion, will post it as a separate message.
if the challenge implies "being right on something" that's tricky, since we don't have consensus
No design in my head, but if we can reach some consensus on the message hash root for a superepoch (using your proposal), I guess a node could potentially be challenged to prove that they're storing random messages in this superepoch with a Merkle proof to the (consensus) root.
Aren't we protected with gossipsub scoring from this?
True. I misstated a problem we had with Relay "leeches" earlier - nodes that are not publicly reachable and only make a few outgoing Relay connections (the number of which they control). They observe gossipsub rules on these connections, but essentially provide nothing to the network since they only consume connections. In other words, they don't enrich overall connectivity by being discoverable and allowing incoming connections. Hence my entertaining the idea of combining this with at least some verification that the nodes provide a public service that contributes to the routing infrastructure (i.e. Sync in this case).
This ties into another difficulty: the overall aim of a resilient relay infrastructure is to disperse the infrastructure as much as possible and increase the number of node operators. Everyone participating in a reward sharing scheme, however, is in essence incentivised to keep the topology as constrained as possible (fewer nodes = more reward). It is possible, as explained above, to be a perfectly good gossipsub node, but artificially limit the connectivity in the network to keep new operators from joining.
No design in my head, but if we can reach some consensus on the message hash root for a superepoch (using your proposal), I guess a node could potentially be challenged to prove that they're storing random messages in this superepoch with a Merkle proof to the (consensus) root.
I see. This addendum can fix the "lazy operator" issue, and moreover it prevents people from relying and not storing the messages. But this overlaps relay and store protocols. I mean, this proposal aimed to incentivize relay, not store. So not sure if we are mixing things? Or well, maybe relay and store can go together?
As a side note, another problem is how to act upon the challenge. NodeA challenges NodeB. Is the challenge result stored onchain? (probably not since that will consume to much resources). Or this information affects gossipsub scoring?
This ties into another difficulty: the overall aim of a resilient relay infrastructure is to disperse the infrastructure as much as possible and increase the number of node operators. Everyone participating in a reward sharing scheme, however, is in essence incentivised to keep the topology as constrained as possible (fewer nodes = more reward). It is possible, as explained above, to be a perfectly good gossipsub node, but artificially limit the connectivity in the network to keep new operators from joining.
I think there is more nuance here. As a node, I prefer to have a larger node set as that increases the size of the pie of rewards. However, I want these nodes to be poorly connected, not to be able to vote for the "correct" merkle root and claim rewards.
Some ideas and comments:
@mart1n-xyz Thanks for the comments!
As a node, I prefer to have a larger node set as that increases the size of the pie of rewards
afaik that's not the case. More node operators in thenetwork imply less rewards for each one. Rewards come from users registering RLN memberships which are a different actor than node operators.
Re. Your idea of "sub-merkle-trees".
It can be actually thought as the same tree, but going n
levels down from the root. I mean, if you want to divide the superepoch in 2, you get 2 leafs (the child of the root). With superepoch/4 you get the 2 child of the child, etc. So same merkle tree splited in subtrees.
But not sure that the complexity is justified. What's the intention behind? To allow late joiners to participate? This could be fixed with a smaller superepoch size, so that if you miss some part of it, it's not a lot (eg few minutes). And well, store sync also allows late joiners to sync up.
I may have to reconsider the superepoch size, because that's how frequently nodes reach consensus on the root. Right now the epoch is 10 minutes, so a superepoch will be for sure greater than that. And that may be too high.
I think we should also bring identity into this discussion
Indeed. This is very important to for "1. Node registration" and maybe can be reused for other stuff.
But this overlaps relay and store protocols.
Indeed. Well, I would say it overlaps Relay and Sync, which seems to reflect the model in which routing can in fact take place via either relay or sync. To me it seems Sync may be a required component in any case for nodes to have a reasonable chance to reach consensus on the Merkle root. That said, I would lean towards a Relay-only solution, but I'm not sure how solvable the lazy node and anti-connectivity problems are if we don't consider Relay and Sync as an integrated routing protocol. Or, at least, this is a possible solution that may give us more angles from which to approach the problem.
@alrevuelta Thanks for explaining. Makes sense, merkle trees are not my forte :)
What's the intention behind? To allow late joiners to participate?
To allow nodes launching within an superepoch to get partial rewards if they are not able to derive the merkle root for the complete set of messages but can prove a subset.
I started writing down my thoughts on this issue, and while the result may sound a bit theoretical, I hope it will be helpful.
In the big picture, what are we aiming to achieve?
We want the network to function effectively. We need to incentivize nodes to act in the best interest of the network. At the same time, we want rewards to be distributed fairly.
It's important to clearly define the terms we're using. I find it helpful to think in terms of metrics.
Imagine we could measure anything in the Waku network. What global metric would we aim for? What do we mean by "functioning effectively"?
In simple terms, a functioning network means messages are propagated quickly and reliably. This implies several independent metrics:
T
seconds of its first propagation;N
percent of nodes to receive a message.For example, consider using "infection" terminology: nodes that receive a new message are "infected." There could be a trade-off between:
Which option is preferable?
Question: What global metric should we optimize?
The network consists of independent nodes whose behavior we want to incentivize. We need to determine what local behaviors contribute to optimizing our global metric. Informally, for a node to maximize message propagation, it should:
This behavior needs a formal definition, as informal rules can be exploited. For example, if we define ideal behavior as "forward messages to as many peers as possible," forwarding to already "infected" peers may not help overall propagation.
Question: What local behavior optimizes the global metric?
We should define a local metric as a proxy for the desired local behavior. This metric must be measurable and ideally provable. The best metric would ensure that the only way for a node to score well is by honestly performing the desired behavior.
Question: What local metric best reflects the ideal local behavior?
Finally, we need to figure out how to measure the chosen local metric. If nodes are motivated to increase their score, self-reporting could be unreliable. However, some metrics are only known to the nodes involved (e.g., how many messages they actually relayed).
Question: How do we reliably measure the local metric in practice?
Based on the framework above, the proposal under discussion seems to be as follows:
Key assumptions:
We need to examine each assumption to determine how often they hold true. Assumptions rarely hold in every case, so the question is whether they hold often enough for our purposes.
A few additional points worth exploring:
How important is it to incentivize decentralization? Maximizing performance metrics might favor large, centralized nodes in data centers over smaller, home-run nodes. Do we want to encourage not only performance but also decentralization?
The proposal assumes that each message belongs to a single superepoch. Without full consensus, issues could arise if a message is issued exactly at the boundary between epochs, leading to disagreement on which epoch it belongs to and, consequently, on the correct root hash.
What happens if no candidate root hash secures a supermajority? What should be the protocol in such cases?
This section may be somewhat abstract, but I think it's important to mention.
I’m still unclear on the relationship between Waku and consensus. On the one hand, Waku is part of the blockchain ecosystem, where consensus is crucial for ensuring correctness across transactions. In financial systems, users want to ensure no double-spending occurs, requiring a global awareness of all transactions. This leads to the creation of consensus algorithms like PoW or PoS, with strong guarantees.
In contrast, consensus may not be as critical in a messaging network like Waku, where users generally care only about messages sent to them. While synchronization protocols are important, they may not need the same level of finality guarantees as blockchains.
However, by introducing financial incentives, we invite potential exploitation. In blockchains, consensus algorithms prevent cheating around rewards to a large extent (although even then things like selfish mining do exist), but Waku lacks such a robust consensus mechanism. Specifically, the root-hash-based proposal assumes a globally agreed-upon correct root hash without a full consensus mechanism to enforce it. My concern is that attackers may find ways to game the system by exploiting differences between Waku and consensus-based protocols.
This all boils down to two key points:
@s-tikhomirov Thanks for the comments.
To clarify, imho the waku protocol has different behaviors to incentivize, but not all of them are archived via this root voting suggestion. I see two layers of incentives:
Infecting 50% of nodes in 1 second, and another 40% over 9 seconds (total of 10 seconds to infect 90% of nodes); Infecting 50% of nodes in 3 seconds, and another 40% over 5 seconds (total of 9 seconds to infect 90% of nodes). Which option is preferable? Question: What global metric should we optimize?
I would say this is too low level for the "high-level crypto incentives". This kind of belongs to the gossipsub domain and not exactly, but already dealt with gossipsub scoring.
Key assumptions:
1. A single correct root hash exists;
2. The correct root hash can be determined through voting;
3. Knowing the root hash implies knowledge of all underlying messages;
4. Knowing all messages implies that all messages were relayed;
5. Relaying all messages ensures optimal network performance.
1 yes. 2 yes, assuming 2/3 are honest. 3 not really, but thinking about a challenge proposed in other comments (eg lazy node using store sync building tree and discarding the messages). 4 maybe not, but there are other ways to ensure messages are relayed. 5 yes, but there is more than that.
Maximizing performance metrics might favor large, centralized nodes in data centers over smaller, home-run nodes
It depends on where you set the threshold. If I reward anyone that solves 2+2 and give 5 seconds I'm not favoring anyone, since that's a commodity. Anyone can solve that no matter their resources. The result from supercomputer is worth the same as the one from a RaspPi.
The proposal assumes that each message belongs to a single superepoch. Without full consensus, issues could arise if a message is issued exactly at the boundary between epochs, leading to disagreement on which epoch it belongs to and, consequently, on the correct root hash.
Edge cases should be dealt with deterministically in the implementation but I would say that's not a problem. RLN binds messages to an epoch. And superepochs are just n
epochs together.
What happens if no candidate root hash secures a supermajority? What should be the protocol in such cases?
It's one of the open problems. Will update it once I have more. The challenge here is solving it simply. An option could be to allow a second round, where nodes are expected to use store sync under the hood to try to reach consensus. Not simple though.
In contrast, consensus may not be as critical in a messaging network like Waku, where users generally care only about messages sent to them. While synchronization protocols are important, they may not need the same level of finality guarantees as blockchains.
I see consensus as important because of two things:
I would suggest to define more precisely which metric we're trying to optimize and which local behavior to incentivize;
I see this root proposal as more generic than that. The low-level metrics you are referring to can be (and some already are) placed in gossipsub scoring. This proposal just tries to map that scoring to something onchain that we can use to reward operators. I see the root voting as some kind of high-level-summary.
I think it's worth considering to which extent we can make consensus assumptions without a fully-fledged consensus mechanism.
No plan to have a fully-fledged consensus. For that, we have smart contracts with an underlying blockchain infrastructure providing that. imho we need some kind of consensus for messages as a prerequisite for incentives, that ideally can be simple.
Thanks for the reply @alrevuelta !
The idea (and assumption) is that if you do ok in the low-level incentives, you will have the root and will be rewarded.
Sure, but IMO the interesting question is how to ensure that the inverse statement holds. In other words: one should not be able to claim rewards without doing OK at the lower level (i.e., without actually relaying messages).
assuming 2/3 are honest
By the way, where does the 2/3 number come from? I know it is a common security assumption of BFT and PoS protocols, but as long as we have neither of those, do we have to use 2/3 too? Perhaps we could set the limit to, say, 90%, and that would discourage some splitting attacks? (Again, subject to further investigation if we decide to go along this route.)
I see consensus as important
Not disagreeing with you here, but in my view the key question is "can we afford consensus", not "would it be nice to have it". Of course it would be nice, but to implement consensus properly, blockchain protocol designers are spending lots of R&D effort + considerable resource commitment is required from protocol participants (hashing in PoW, locked-up capital in PoS). My feeling is that such things would be an overkill for a message relay network. We can, however, look for a clever way to piggy-back on existing consensus like we do with the RLN contract already.
This proposal just tries to map that scoring to something onchain that we can use to reward operators.
I now realize that this part wasn't clear to me. Do we assume, among other things, that gossipsub score reflects the node's network participation, and that attackers won't fake their scores (perhaps using multiple colluding nodes that rank each other highly without relaying the messages)?
No plan to have a fully-fledged consensus. For that, we have smart contracts with an underlying blockchain infrastructure providing that. imho we need some kind of consensus for messages
By fully-fledged consensus I mean consensus for messages with strong security guarantees (what alse can the consensus be for)? I agree though that we should leverage existing blockchain infrastructure as much as possible.
Sure, but IMO the interesting question is how to ensure that the inverse statement holds. In other words: one should not be able to claim rewards without doing OK at the lower level (i.e., without actually relaying messages).
Totally agree. This is the line of work behind "lazy operator". Largely undefined by now.
By the way, where does the 2/3 number come from?
Open to discussing it. Took it from PoS like systems but ofc tunable.
key question is "can we afford consensus", not "would it be nice to have it"
I would say this question will be answered as the outcome of these discussions. I'm pushing towards having it, since I think its important, but we may have to accept that it is overkill for waku.
We can, however, look for a clever way to piggy-back on existing consensus like we do with the RLN contract already.
Open to ideas here. Can you be more specific? Not sure I see any consensus in RLN. I mean everyone that paid X and called register
gets it, which is quite a universal truth (thanks to the underlying consensus of the blockchainwe use)
I now realize that this part wasn't clear to me. Do we assume, among other things, that gossipsub score reflects the node's network participation, and that attackers won't fake their scores (perhaps using multiple colluding nodes that rank each other highly without relaying the messages)?
Note that gossipsub reputation is local to each node, based on what it sees from interacting with other nodes. Meaning that node_a can have different reputation values in different nodes. Not sure how an attacker can fake it score, because the score is something a node sets based on what it sees (performance).
We can, however, look for a clever way to piggy-back on existing consensus like we do with the RLN contract already. Open to ideas here. Can you be more specific?
No concrete ideas yet, just a general thought that blockchain consensus i.e. smart contract provides us with a source of truth (just like the RLN contract is the source of truth about who has the right to relay messages).
Note that gossipsub reputation is local to each node
Sure, I understand that. But I'm not sure I understand how gossipsub scores are used in this proposal. Do we only allow nodes with score higher than X (as reported by... whom?) to vote on the root hash?
Sure, I understand that. But I'm not sure I understand how gossipsub scores are used in this proposal. Do we only allow nodes with score higher than X (as reported by... whom?) to vote on the root hash?
I mean, if your node forwards invalid messages (according to our gossipsub validation) or eg you don't send messages to any peers (just receive them) this will lower your score to a point where no nodes would want to connect to you. If this happens, no peers = no messages = no valid merkle root.
Do we only allow nodes with score higher than X (as reported by... whom?) to vote on the root hash?
Anyone can vote on the root hash as long as you have posted some collateral. The right to vote is not connected to gossipsub scoring.
Thanks everyone for the feedback.
As part of the conversations, the following challenges were spotted:
Proposing a solution for each.
Problem:
Solution:
STORE_SYNC_DELAY_SECONDS
. Only serve messages up to current_timestamp - STORE_SYNC_DELAY_SECONDS
.[current_timestamp - STORE_SYNC_DELAY_SECONDS, current_timestamp]
window.The whole proposal for distributing RLN fees works under the assumtion that knowing the root implies participating in relay, which is the behaviour we want to incentivize. However, we can have lazy nodes that don't run relay at all, but manage to get a valid Merkle root via store sync at the expense of others.
There are two variants of lazy nodes:
To solve this problem we introduce a limit on the messages that store-sync serves. It shall serve messages up to current_timestamp - STORE_SYNC_DELAY_SECONDS
, meaning that
only relay nodes will known the messages during the STORE_SYNC_DELAY_SECONDS
window.
On the other hand, we introduce a challenge. Its result is included in the gossipsub scoring. More particularly, in the P5
score aka "Application-Specific" score.
The P5
score increases when the challenge is answered correctly and decreases when it is not. It's designed in a way where few ocassional invalid responses don't affect
the score and decays over time to a point where it no longer matters. But when the node fails multiple challenges in a row it lowers its score to a point where a disconnection is triggered.
This challenge is done periodically by every peer with its connected peers, verifying that they know the content of a message given its hash for this STORE_SYNC_DELAY_SECONDS
window. Since only nodes running relay will know this, a peer failing to reply to this challenge will be descored.
Note that a PROPAGATION_TIME_SECONDS
window shall be added as a safety margin, in order to ensure that the messages are propagated in the network.
Running the challenge before the message is propagated would lead to erroneus results.
The challenge:
current_timestamp - PROPAGATION_TIME_SECONDS - STORE_SYNC_DELAY_SECONDS
and current_timestamp - PROPAGATION_TIME_SECONDS
.P5
score. Lower otherwise. Since it's in the best interest of every node to keep the network clean of lazy peers, all nodes will report each other and lazy peers will end up without peers to connect to. The score decrease shall be done in a way where it wouldn't be feasible for a node to swing from vine to vine.
The cost of this solution is:
STORE_SYNC_DELAY_SECONDS
window, store-sync will go a bit behind relay. Note that this does not apply to filter.Problem:
Solution:
The initial proposal didn't take into account the posibility of nodes not reaching consensus during a voting round. In order to archieve so, we introduce infinite voting rounds, where nodes keep voting until a supermajority is reached. The following properties should guarantee to some extent that the Merkle root will eventually converge, if not in one round, in a finite number of rounds:
With an example.
node_0 .. node_(n-1)
be n
node operators registered and participating in consensus with a valid vote.CONSENSUS_THRESHOLD
be the minimum required % of votes to accept a root as valid. Let's say 2/3.roots
be an array with the root that each node voted [root_0 ... root_(n-1)]
While no single root gets more that CONSENSUS_THRESHOLD
, enter in a new round of voting until consensus is reached.
Note that as an implementation decission, when no consensus is reached in the first round, nodes can be more proactive to ensure that in the second consensus is archieved. Some strategies:
This should reduce the probabilities of a network partition and the node missing messages. Nodes should always be well connected and using store sync regardless, but the lack of consensus during a round could be used to be more aggresive during a short time window.
Some reasons why the network may not reach consensus:
Problem:
superepoch
, matching the billing period. However this is to long.Solution:
superepoch
and reach consensus every epoch
. The distributed fees for each epoch
are proportional to the ones gathered during the billing period.The initial proposal wanted to introduce the concept of superepoch
matching the billing period and with 30 days in mind.
However, reaching consensus and distributing fees every 30 days might be a lot.
In order to reach consensus more often, the superepoch
idea is discarded. Consensus is now reached every epoch
.
If a billing period contains n
amount of epoch
then the fees distributed for each would be /n
.
Thank you for this thorough analysis and systematically addressing concerns! I have two parts to my response below. One part is to ask some clarifications on your proposed solutions to the open problems, the second is to summarise some further open questions that I think we should consider. I'll try to be as structured as possible. 😅
First, a couple of questions on your proposed solutions:
just fetching messages via store sync
Is this necessarily a bad thing from network behaviour POV? If a node uses only store sync it would still help distribute messages and could in theory participate in services such as filter/store. From the node's POV this would seem a bit silly as RLN Relay would be a cheaper way of tracking the root than relying on store sync.
Modify store-sync to only serve messages with a given delay
Store Sync is designed to improve "real-time" message routing reliability over a time window longer than what gossipsub caches can provide. Since real-time delivery has some jitter, we do already introduce a short delay before messages form part of the sync range. For this proposed mechanism to work, though, I assume we would have to delay with at least one epoch
period so that the RLN Relay nodes will be the only ones able to commit the correct root for the finished epoch? My concern would be that this reduces the real-time value of Store Sync or am I missing something with my assumptions?
reach consensus every
epoch
IIUC this means that every participating node would have to commit a root for every epoch
? How expensive would this be compared to the resulting reward share, even on a cheap L2? For example, even if we max out the message rate in the RLN contract as it stands now (unlikely), the entire contract value would be $8000 over 6 months (currently < $2 per node per month in TWN). This translates to a very tiny amount available for sharing per day and miniscule amounts per epoch. We may hope that the value of memberships and total message rates will increase significantly in future, but it may be a while before RLN generates enough income for reward sharing to be profitable. This is at least partly because RLN was not designed as a mechanism to generate income, but as a rate-limiting mechanism we need to be as cheap as possible to allow max app/client participation (which would make Service Incentivisation more viable as a road to sustainability).
I'll add my more general questions in a next comment.
I structure this under three meta questions that's less about technical feasibility than about what our long-term vision of Waku should be. Certainly open for discussion here, as Waku's definition and role as infrastructure in the p2p world is up to all of us. These questions are mostly just summaries of what has been discussed before, but I thought it important to gather my thoughts a bit more systematically. I apologise for the length. 😄 Happy to discuss in person or clarify any points.
I guess this question covers two points: whether we need on-chain consensus and whether we can afford it. Thus far we've considered the low-latency and ephemeral quality of messaging in Waku as one of our strongest draws - Waku is useful for the type of decentralised signalling use cases that are kept off-chain precisely because it tends to be incompatible with the restrictions of on-chain consensus. Consensus is expensive, at least in terms of complexity and overhead, and the long-term benefits of consensus to Waku as infrastructure not entirely clear to me.
Some possible benefits that consensus may bring (other than RLN fee distribution) and a brief thought about each:
Provable message distribution Consensus could arguably bring provable "reliable" distribution to Waku. The cost here though is quite high and, again, I'm not sure it adds much value for end users. For example, this wouldn't translate into end-to-end reliability for apps - Waku is just one transport component in the end-to-end app protocol stack and each app would still need to implement reliability protocols as they do now. Consensus will introduce costs for end users and service nodes alike (whether via rising RLN prices, added latency/complexity, less accessible network, etc.). I can see a role for the local, off-chain consensus-like model that Waku Sync introduces which would provide a similar level of reliability with a much simpler model.
Provable service provision This is actually just another aspect of the previous point. For paid services related to message delivery (Store, Filter, Lightpush) a message distribution consensus mechanism could help the service or client to prove/disprove that a paid service has been provided. I can see this being especially compatible with Lightpush (which pays for distribution); for filter/store I imagine more work would be necessary to prove that queries have been fulfilled by the service node. We would also need to find something beyond message consensus if Waku were to be a general service marketplace for services unrelated to message delivery. In either case, I think some form of reputation (although admittedly also complex) is more likely to be compatible with the often fuzzy requirements we have for "good" services (a subjective combination of reliability, availability, latency and a host of service-specific quality measurements). I know this is an intricate separate discussion, so I'll abbreviate my comments on this point.
I think after these analyses we may be in a better position to ask and answer the meta questions again:
This model assumes that we want to only incentivise node participation in RLN Relay. Perhaps we also want to incentivise selfless participation in RLN Relay - that is, we should encourage nodes to become richly connected for the network to become as diverse and dispersed as possible. It seems to me that nodes will be incentivised to keep the topology as centralised and small as possible. They could limit their connections to absolute minimum to still track the root, refuse new nodes, etc. - there seems to be no mechanism that would prevent this IIUC? GossipSub's tit-for-tat mechanism does a great job of encouraging nodes to be useful while benefitting off the network themselves. A reward distribution would incentivise nodes to try and break this mechanism.
Similar to @jm-clius comments:
STORE_SYNC_DELAY_SECONDS
. Will it be possible to set a value high enough so it has the intended effect for this protocol, and low enough to enable reliability protocol to function with acceptable latency.superepoch
is too large. However, an epoch
seems to be too small. We may need to define and middle ground here appropriate to the on chain cadence and fee.@jm-clius Thanks! Answers to the first comment: Clarifications to the proposal.
Is this necessarily a bad thing from network behaviour POV? If a node uses only store sync it would still help distribute messages and could in theory participate in services such as filter/store
I thought about that while writing the proposal. Good point, maybe it's okay that store-sync nodes can vote for the root. And would simplify the solution because it simplifies the threat model, since these nodes are allowed. No need also to have this STORE_SYNC_DELAY_SECONDS
window.
My concern would be that this reduces the real-time value of Store Sync or am I missing something with my assumptions?
It indeed reduces the real-time value of store-sync. But re: previous point maybe we don't need to add this artificial delay if we are ok with pure store-sync nodes voting for the root. So no real-time limitations then.
IIUC this means that every participating node would have to commit a root for every epoch?
Yes. I don't have the gas estimate. It won't be crazy high, but the commit-reveal-claim involves at least 3? (I would say rather cheap) transactions. Regarding the numbers, I would approach it from a different angle. Instead of starting with eg $8000 splitting the cost top-down to what the operator will get, I would do it bottom-up. Meaning starting with the expected gas cost, expected infra cost, expected tx per month, add a % and come up with the expected membership cost to make it sustainable. Will it be prohibitive? Don't have the answer but indeed I need to provide an estimation on the cost of gas for this.
@jm-clius Answers to the comment: Technical feasibility.
Do we really want strong consensus in Waku?
You have done a well summary on what consensus can bring, i) rln fee distribution aka incentives for nodes, ii) the ability to prove a wide variety of things (distribution and provision).
I would say we do need it, because of what it unlocks. However, the cost of it may outweigh the benefits and make it non-viable.
Do we really need RLN fee distribution?
I see this question very coupled with the "do we need consensus?".
And re:
nodes participate in RLN Relay because it allows them to be good service nodes for paid services
Agree on this but how to know what's "good" without consensus? And how to have consensus without some incentives? So IMHO consensus unlocks incentives and vice-versa. And then this unlocks service incentivization because you can prove everything and know whats "good".
By now, RLN fee distribution is the best candidate, since its the only entry point of $ into the system.
What are we really incentivising?
Not sure I agree with this statement.
It seems to me that nodes will be incentivised to keep the topology as centralised and small as possible
Any node (registered operator or not) can introduce a message into the network. And you don't know who is that. If you keep yourself connected to a few nodes, you increase the probability of missing a message, and hence not getting the valid root that will get you rewarded.
I would say what we are incentivizing here is that you i) follow the root and ii) know whats underneath the root (leafs + content) with the challenge. By doing so, we ensure that nodes are up to date, get rewarded for that, and reach consensus unlocking proving anything.
Whether Waku can afford this extra complexity and cost, still not sure myself.
A relatively minor idea regarding this:
Increased bandwidth: The challenge introduces an extra communication taking neglectable bandwidth in the challenger (message hash) and moderated in the challenged (message content).
We can implement the challenge-response protocol in such a way that doesn't involve transferring the whole message, for example like this (m
is the message, H = h(m)
is the message hash):
H
, salt
H(m || salt)
The second step only requires transfering one hash instead of one message.
TLDR: Proposal on how to distribute RLN membership fees to the node operators that submit the correct Merkle root of the messages sent during a
superepoch
.Problem statement
Waku has chosen Rate Limiting Nullifiers to rate-limit users, ensuring they do a fair usage of the network resources. In order to do so, it requires users to pay for the so-called RLN membership, which entitles the holder to send a given amount of messages per unit of time while its active.
However at the time of writting, the collected fees from RLN memberships don't have a designated purpose. In order to ensure the sustainability of the network and reward node operators for their work, one option would be to distribute them among node operators.
We define node operators as entities that participate in the network, running nodes and relaying traffic. They are the ones that make the network work. A node operator is also a registered entity known by the waku protocol, see more information below.
This issue proposes how to distribute the generated rewards by RLN memberships to node operadors. It does so by rewarding the node operators that vote on the correct Merkle root for each
superepoch
. The root agreed by a 2/3 supermajority is considered the corerct one, and the nodes that voted on that get a reward. This reward is a fraction of the fees that were generated from membership registration during thatsuperepoch
. Since gossipsub scoring punishes lazy peers and incentivices peers to behave properly, we assume that knowing the valid Merkle root implies having participated in relay protocol, which is the behaviour we want to incentivice.Proposal
This proposal introduces the following concepts:
superepoch
that matches the RLN membership renovation cycle.superepoch
.superepoch
each operator can submit their local Merkle root of the Merkle tree containing all messages, using a commit-reveal mechanisim to prevent lazy nodes from copying honest ones.superepoch
, and every node is given/n
the rewards from the subscription fees of thatsuperepoch
, wheren
is the amount of nodes that matched the majority.Below, each point is ellaborated.
1. Node registration
Since anyone would be able to submit a vote on the Merkle root, we need some kind of pseudo-identity to sybil-protect the voting. Otherwise the same entity knowing a valid Merkle root, defined as the one that represent a Merkle tree constructed by all messages within a
superepoch
could vote multiple times, getting more rewards than it should.A suggested mechanisim is to require a
STAKE_AMOUNT
for node operators, which provides an economic sybil-protection. When an operator stakes this amount, it becomes eligible to participate in the voting and hence, eligible for rewards. When the operator no longer wants to participate, it will be returned. The stake can be slashed if they dont vote inSUPEREPOCHS_PENALTY
consecutivesuperepochs
. Its yet to be defined if there would be other slashing conditions.2. Superepoch
A
superepoch
is the time window used for voting. During eachsuperepoch
nodes construct a Merkle tree with all seen messages and once finished, the Merkle root is calculated.The
superepoch
matches the "billing period", see comment. In other words, if 10 users paid an RLN membership for a givensuperepoch
then all this generated fees are distributes among the nodes operators that participated in thissuperepoch
.3. Messages Merkle Tree
For each
superepoch
node operators construct a Merkle tree with all the messages they have:4. Commit-reveal
Since all onchain activity is public and anyone can see it, we propose a commit-reveal mechanisim to prevent lazy nodes from sniffing the blockchain and copying the Merkle root of honest nodes.
Once the
superepoch
finishes, a window ofCOMMIT_WINDOW_SLOTS
opens, during which all registered nodes can submit their vote on whichroot
they consider to be valid for thatsuperepoch
. What the node operators commits to (aka commitment) is a secret, in this case theroot
. The commitment ishash(root + salt)
. That would be public but since getting the preimage of that hash is not possible, an attacker won't be able to reveal in the next phase.Once the
COMMIT_WINDOW_SLOTS
is due, no more votes are accepted for that superepoch and aREVEAL_WINDOW_SLOTS
window is opened. During this window node operators can reveal their secret. If the providedroot
during the reveal phase matcheshash(root + salt)
, then their vote would be accepted.Finally, once
REVEAL_WINDOW_SLOTS
is due eligible node operators can claim their share of the reward. See next section.5. Claim
After
REVEAL_WINDOW_SLOTS
the commit-reveal phase is done and votes are counted. An example:node0..4
(5) voted on rootroot1
.node5..24
(20) voted on rootroot2
.node25
(1) voted on rootroot3
.note26..30
(4) did not vote.As explained before, node operators are registered beforehand in the contract a priori so the total amount of votes are known. The previous votes can be written as
[num_votes, root_n]
or[5, root1], [20, root2], [1, root3], [4, NaN]
. With the votes, a simple 2/3 consensus of a supermajority of 66% is used to decide which root to accept.The weights would be:
root1
5/30 = 16%
root2
20/30 = 66%
root3
1/30 = 3%
This means that
root2
is accepted onchain as the Merkle root of all the messages seen during thesuperepoch
. Nodes that voted for that root are eligible for rewards. The amount to be distributed is calculated as the amount of memberships*
price paid for each membership for eachsuperepoch
. If 100 active memberships were active and each one paid 10 tokens/superepoch, then the amount to spit is:100 * 10 = 1000 tokens
.Membership fees would be split among the nodes that voted the agreed root, aka
root2
meaning,1000/20 = 50
tokens fornode5..24
.Note that a node operator that does not vote during
SUPEREPOCHS_PENALTY
can be slashed by anyone, as this information is accesible onchain by anyone. The whistleblower will getWHISTEBLOWER_REWARD
which is<=STAKE_REQUIREMENT
and the node will no longer be part of the active node operators.In order to make it gas efficient, each node operators rewards can be claimed at any time. For each
superepoch
the claimable amount keep accumulating, so that the operator can claim their rewards of multiplesuperepochs
at once.Consensus
An interesting side effect of this proposal is that it introduces an onchain consensus of the messages sent for each
superepoch
. This would allow light clients (eg filter) to verify that a given message was indeed sent through the network, since the onchain root can be used.Open problems
This proposal has some open problems:
STAKE_AMOUNT
for multiple node operators. This will allow it to vote multiple times but its only doing the work of one node.