Open stubbsta opened 3 weeks ago
Could you try to reproduce it with just registering new memberships? Would be great to reproduce it with less moving parts.
./go-waku-light \ --eth-endpoint=http://foundry:8545 \ --contract-address=0xCf7Ed3AccA5a467e9e704C703E8D87F634fB0Fc9 \ register \ --priv-key=REPLACE_YOUR_PRIV_KEY \ --user-message-limit=200
Just registering one account at a time doesn't reliably induce the issue, but adding
--amount=10
to the register command does make invalid messages appear.
I tested using:
-harbor.status.im/wakuorg/nwaku:v0.30.0
-block-time=3
I will do some more tests using quay.io/wakuorg/nwaku-pr:2871
and different block-times
does make invalid messages appear.
Is the invalid reason the same as before? "invalid message: provided root does not belong to acceptable window of roots"?
yes, same reason
I also get the same issue using quay.io/wakuorg/nwaku-pr:2871
and block-time=3
I haven't seen it using that same image and block-time=60, but then I often get this error when the running the registration:
time="2024-07-04T10:22:59Z" level=fatal msg="timeout expired waiting for tx to be validated txHash: 0xaa137f6d0833f2f1cc03d546df2277435ccaa58121bb4cac574b0fde0e42cdb3: context deadline exceeded"
I haven't seen it using that same image and block-time=60, but then I often get this error when the running the registration:
this is ok. i) its a go-waku-light error not nwaku and ii) that's a block time that i don't expect to see in reality. can be trivially fixed (increase timeout) but not worth.
NUM_NWAKU_NODES=30
on this machineIn both cases the anvil block-time was 12s.
On another machine where the average proof generation time was 100ms, this config did not produce any invalid messages.
This is interesting. perhaps we should increase the delay between each message being sent? will reproduce
I think we discussed the message rate somewhere, but as a reminder in case it is relevant, the message rate using the waku-simulator rest-traffic script is the time taken to send the message (including proof generation) for each node plus the TRAFFIC_DELAY_SECONDS
. In the case of 1 node and TRAFFIC_DELAY_SECONDS=0
the delay in publishing is roughly equal to the proof generation duration
so the problem here is that some nodes are processing the new membership faster than others? so during this short time there is some kind of fork in the network? so the sender picked up the new membership (root_a) but the received rejects it because it didnt pick up it yet (root_b)
it will happen rarely, but we have to properly fix this, perhaps non trivial? as a random idea, we could use the underlying blockchain as a clock, that relate to rln epochs. eg rln-epoch 1000 maps to slot 1000. if you get a new message with epoch 1000 but you haven't yet processed slot 1000 (which may contain insertions) you put it on hold, first process that slot and then continue the validation of that message. with that you ensure that you have always processed the blockchain event of the rln epoch you are at.
perhaps the "onchain merkle tree" (once we remove the event sync mechanisim) this would be fixed as well?
Opened this with the rootcause: https://github.com/waku-org/nwaku/issues/2928
Problem
During RLN testing with different network sizes and RLN config, it was noticed that sometimes when there are many consecutive RLN membership registrations occurring and many messages published by other already registered nodes, then some messages in the network are invalid. Logging shows:
invalid message: provided root does not belong to acceptable window of roots
Impact
If these messages are not invalid they should be relayed.
To reproduce
Using the waku-simulator https://github.com/waku-org/waku-simulator:
Use the multiaddress in this script:
Expected behavior
These messages should be valid.
nwaku version/commit hash
v0.30.0