waku-org / nwaku

Waku node and protocol.
Other
200 stars 52 forks source link

bug: Noticing that nwaku node is not able to honor IWANTs in status fleet #2702

Open chaitanyaprem opened 5 months ago

chaitanyaprem commented 5 months ago

Problem

Noticed while debugging the issue https://github.com/waku-org/go-waku/issues/1098 in status. Noticed logs in status-app indicating that store nodes were not able to honor some of the IWANT requests at gossip level due to which they were being penalized by the status-desktop nodes.

Impact

Over a period of time, if store nodes gets penalized the local gossipsub may disconnect the peers.

To reproduce

Noticed this on regular log observation in status desktop. geth.log

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots/logs

Details available in discord discussion here https://discord.com/channels/1110799176264056863/1238425045710147645

INFO [05-10|10:09:45.469|go.uber.org/zap/sugar.go:168] peer 16Uiu2HAmKBd6crqQNZ6nKCSCpHCAwUPn3DUDmkcPSWUTyVXpxKsW didn't follow up in 2 IWANT requests; adding penalty per @richard-ramos In the logs for the bridge, I see it happen with: 16Uiu2HAm3xVDaz6SRJ6kErwC21zBJEZjavVXg7VSkoWzaV1aMA3F 16Uiu2HAmB7Ur9HQqo3cWDPovRQjo57fxWWDaQx27WxSzDGhN4JKg 16Uiu2HAmKBd6crqQNZ6nKCSCpHCAwUPn3DUDmkcPSWUTyVXpxKsW 16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT

I have also noticed similar logs in my status-desktop as well but all are only related to store nodes.

nwaku version/commit hash

v0.24.0

Additional context

Add any other context about the problem here.

Ivansete-status commented 5 months ago

Thanks for submitting this @chaitanyaprem ! I changed the log level to TRACE in shards.staging-Amsterdam nodes. Let's use a fixed peerId so that we can track how the iwant messages are handled in those nodes.

Ivansete-status commented 4 months ago

@chaitanyaprem - After further analysis, the reason why your node is not being replied to is that your node got a very low score ( < 0 .) Once that happens, all the iwant requests are ignored with iwant: ignoring low score peer.

I will add more logs in the scoring GossipSub module to properly determine why the score is getting that low. One possible option is that the mesh is full (8 members) but I need to confirm that better.

Ivansete-status commented 4 months ago

We will get better insight after having the following PR merged: https://github.com/vacp2p/nim-libp2p/pull/1107

chaitanyaprem commented 4 months ago

@chaitanyaprem - After further analysis, the reason why your node is not being replied to is that your node got a very low score ( < 0 .) Once that happens, all the iwant requests are ignored with iwant: ignoring low score peer.

I will add more logs in the scoring GossipSub module to properly determine why the score is getting that low. One possible option is that the mesh is full (8 members) but I need to confirm that better.

Oh, it would be good to know why the score has been reduced?

Ivansete-status commented 4 months ago

Blocked until the v0.29.0 is deployed where we have better logs to debug the issue.

Ivansete-status commented 1 month ago

@chaitanyaprem - after revisiting the issue again with further nwaku logs it seems it is not related to a low peer score.

Nevertheless, as of what I've seen in status-go , the following doesn't make much sense because the iwant follow up time seems to be set to 3 seconds and therefore I can't quite understand the following where the latest iwant was sent few minutes back:

DEBUG[08-23|17:23:25.966|go.uber.org/zap/sugar.go:198]                                                               IWANT: Sending 1 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r 
DEBUG[08-23|17:23:26.027|go.uber.org/zap/sugar.go:198]                                                               IWANT: Sending 8 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r 
INFO [08-23|17:23:32.579|go.uber.org/zap/sugar.go:204]                                                               peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r didn't follow up in 1 IWANT requests; adding penalty 
chaitanyaprem commented 1 month ago

@chaitanyaprem - after revisiting the issue again with further nwaku logs it seems it is not related to a low peer score.

Nevertheless, as of what I've seen in status-go , the following doesn't make much sense because the iwant follow up time seems to be set to 3 seconds and therefore I can't quite understand the following where the latest iwant was sent few minutes back:

DEBUG[08-23|17:23:25.966|go.uber.org/zap/sugar.go:198]                                                               IWANT: Sending 1 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r 
DEBUG[08-23|17:23:26.027|go.uber.org/zap/sugar.go:198]                                                               IWANT: Sending 8 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r 
INFO [08-23|17:23:32.579|go.uber.org/zap/sugar.go:204]                                                               peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r didn't follow up in 1 IWANT requests; adding penalty 
DEBUG[08-23|17:23:26.027|go.uber.org/zap/sugar.go:198]                                                               IWANT: Sending 8 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r 
INFO [08-23|17:23:32.579|go.uber.org/zap/sugar.go:204]                                                               peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r didn't follow up in 1 IWANT requests; adding penalty 

These logs show that IWANt was sent about 6 seconds ago which did not have a followup. where is the IWANT follow-up time set?

Ivansete-status commented 1 month ago

@chaitanyaprem - my bad! you are absolutely right. Only ~6 seconds back.

The max time the go-waku node waits for iwant replies is defined in: https://github.com/libp2p/go-libp2p-pubsub/blob/b8a6a868adce87101b3f61cb1b1a644db627c59f/gossipsub.go#L70

I wonder if we could increase that time to 10s?

On the other hand, I realized that in nwaku the iwant responses are sent with low priority: https://github.com/vacp2p/nim-libp2p/blob/18a2e79ce209d2f21d8c5db7a41659cc39dbe0b4/libp2p/protocols/pubsub/gossipsub.nim#L383-L384

chaitanyaprem commented 1 month ago

I wonder if we could increase that time to 10s?

what is the value set in nwaku for this? also, i have to see if there is a way to change this in go-waku/go-libp2p.

chaitanyaprem commented 1 month ago

On the other hand, I realized that in nwaku the iwant responses are sent with low priority:

what does that mean, do you think they are intentionally delayed?

Ivansete-status commented 1 month ago

On the other hand, I realized that in nwaku the iwant responses are sent with low priority:

what does that mean, do you think they are intentionally delayed?

Sorry, I didn't read the code properly. What is actually sent in that case are the iwant messages in a low priority queue. Those iwant msgs are sent after getting the ihave msg from a peer.

In the nim-libp2p implementation, the node doesn't establish a particular timeout for every iwant message. i.e., every iwant message is sent through a low-priority queue and, in case this queue reaches 1024 msgs, it disconnects the node.