Open chaitanyaprem opened 5 months ago
Thanks for submitting this @chaitanyaprem !
I changed the log level to TRACE
in shards.staging
-Amsterdam nodes.
Let's use a fixed peerId so that we can track how the iwant
messages are handled in those nodes.
@chaitanyaprem - After further analysis, the reason why your node is not being replied to is that your node got a very low score ( < 0
.)
Once that happens, all the iwant
requests are ignored with iwant: ignoring low score peer
.
I will add more logs in the scoring
GossipSub module to properly determine why the score is getting that low. One possible option is that the mesh is full (8 members) but I need to confirm that better.
We will get better insight after having the following PR merged: https://github.com/vacp2p/nim-libp2p/pull/1107
@chaitanyaprem - After further analysis, the reason why your node is not being replied to is that your node got a very low score (
< 0
.) Once that happens, all theiwant
requests are ignored withiwant: ignoring low score peer
.I will add more logs in the
scoring
GossipSub module to properly determine why the score is getting that low. One possible option is that the mesh is full (8 members) but I need to confirm that better.
Oh, it would be good to know why the score has been reduced?
Blocked until the v0.29.0 is deployed where we have better logs to debug the issue.
@chaitanyaprem - after revisiting the issue again with further nwaku
logs it seems it is not related to a low peer score.
Nevertheless, as of what I've seen in status-go
, the following doesn't make much sense because the iwant follow up time
seems to be set to 3 seconds and therefore I can't quite understand the following where the latest iwant
was sent few minutes back:
DEBUG[08-23|17:23:25.966|go.uber.org/zap/sugar.go:198] IWANT: Sending 1 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r
DEBUG[08-23|17:23:26.027|go.uber.org/zap/sugar.go:198] IWANT: Sending 8 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r
INFO [08-23|17:23:32.579|go.uber.org/zap/sugar.go:204] peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r didn't follow up in 1 IWANT requests; adding penalty
@chaitanyaprem - after revisiting the issue again with further
nwaku
logs it seems it is not related to a low peer score.Nevertheless, as of what I've seen in
status-go
, the following doesn't make much sense because theiwant follow up time
seems to be set to 3 seconds and therefore I can't quite understand the following where the latestiwant
was sent few minutes back:DEBUG[08-23|17:23:25.966|go.uber.org/zap/sugar.go:198] IWANT: Sending 1 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r DEBUG[08-23|17:23:26.027|go.uber.org/zap/sugar.go:198] IWANT: Sending 8 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r INFO [08-23|17:23:32.579|go.uber.org/zap/sugar.go:204] peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r didn't follow up in 1 IWANT requests; adding penalty
DEBUG[08-23|17:23:26.027|go.uber.org/zap/sugar.go:198] IWANT: Sending 8 messages to 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r
INFO [08-23|17:23:32.579|go.uber.org/zap/sugar.go:204] peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r didn't follow up in 1 IWANT requests; adding penalty
These logs show that IWANt was sent about 6 seconds ago which did not have a followup. where is the IWANT follow-up time set?
@chaitanyaprem - my bad! you are absolutely right. Only ~6 seconds back.
The max time the go-waku node waits for iwant
replies is defined in: https://github.com/libp2p/go-libp2p-pubsub/blob/b8a6a868adce87101b3f61cb1b1a644db627c59f/gossipsub.go#L70
I wonder if we could increase that time to 10s?
On the other hand, I realized that in nwaku
the iwant
responses are sent with low priority:
https://github.com/vacp2p/nim-libp2p/blob/18a2e79ce209d2f21d8c5db7a41659cc39dbe0b4/libp2p/protocols/pubsub/gossipsub.nim#L383-L384
I wonder if we could increase that time to 10s?
what is the value set in nwaku for this? also, i have to see if there is a way to change this in go-waku/go-libp2p.
On the other hand, I realized that in
nwaku
theiwant
responses are sent with low priority:
what does that mean, do you think they are intentionally delayed?
On the other hand, I realized that in
nwaku
theiwant
responses are sent with low priority:what does that mean, do you think they are intentionally delayed?
Sorry, I didn't read the code properly. What is actually sent in that case are the iwant
messages in a low priority queue. Those iwant
msgs are sent after getting the ihave
msg from a peer.
In the nim-libp2p implementation, the node doesn't establish a particular timeout for every iwant
message. i.e., every iwant
message is sent through a low-priority queue and, in case this queue reaches 1024 msgs, it disconnects the node.
Problem
Noticed while debugging the issue https://github.com/waku-org/go-waku/issues/1098 in status. Noticed logs in status-app indicating that store nodes were not able to honor some of the IWANT requests at gossip level due to which they were being penalized by the status-desktop nodes.
Impact
Over a period of time, if store nodes gets penalized the local gossipsub may disconnect the peers.
To reproduce
Noticed this on regular log observation in status desktop.
geth.log
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots/logs
Details available in discord discussion here https://discord.com/channels/1110799176264056863/1238425045710147645
INFO [05-10|10:09:45.469|go.uber.org/zap/sugar.go:168] peer 16Uiu2HAmKBd6crqQNZ6nKCSCpHCAwUPn3DUDmkcPSWUTyVXpxKsW didn't follow up in 2 IWANT requests; adding penalty
per @richard-ramos In the logs for the bridge, I see it happen with: 16Uiu2HAm3xVDaz6SRJ6kErwC21zBJEZjavVXg7VSkoWzaV1aMA3F 16Uiu2HAmB7Ur9HQqo3cWDPovRQjo57fxWWDaQx27WxSzDGhN4JKg 16Uiu2HAmKBd6crqQNZ6nKCSCpHCAwUPn3DUDmkcPSWUTyVXpxKsW 16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfTI have also noticed similar logs in my status-desktop as well but all are only related to store nodes.
nwaku version/commit hash
v0.24.0
Additional context
Add any other context about the problem here.