Open churik opened 2 hours ago
cc @richard-ramos @chaitanyaprem any clues from logs why it might happen?
took a quick look at the logs and looks like store queries are timing out which might be what is causing messages shown in sending state. But
ERROR[10-24|10:38:12.425|github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190] could not query storenode peerID=16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT contentTopics="[/waku/1/0x950464c4/rfc26 /waku/1/0x5143d1fc/rfc26 /waku/1/0x43a43bd3/rfc26 /waku/1/0x8328c702/rfc26 /waku/1/0xb0e6c5a6/rfc26 /waku/1/0xc0086e00/rfc26 /waku/1/0x0cb06bba/rfc26 /waku/1/0x6c2cab7a/rfc26 /waku/1/0x8ee11cb2/rfc26 /waku/1/0x276cf54e/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729763103353073000 to=1729763160259783000 error="context canceled" attempt=1
Another thing i had noticed is that about 11 peers are descored and messages are ignored from them and some of them seem like fleet nodes. So either the desktop node has some bad connectivity causing it to descore the peers or something else is causing this. This coupled with store queries failing must be causing messages not getting delivered.
DEBUG[10-24|10:38:33.323|go.uber.org/zap/sugar.go:198] IWANT: ignoring peer 16Uiu2HAmNTpGnyZ8W1BK2sXEmgSCNWiyDKgRU3NBR2DXST2HzxRU with score below threshold [score = -135.692805]
DEBUG[10-24|10:38:29.651|go.uber.org/zap/sugar.go:198] IHAVE: ignoring peer 16Uiu2HAm2M7xs7cLPc3jamawkEqbr7cUJX11uvY7LxQ6WFUdUKUT with score below threshold [score = -288.371036]
it would help debug further if we have more logs and messageHashes of msgs in the video. As of now logs are there for 30 seconds only which is very short duration.
@chaitanyaprem
there are more of logs here: https://drive.google.com/file/d/10z-cHiuDn_rt9ScL5sj_DGPv_tME9-GZ/view?usp=sharing
ONE MORE IMPORTANT NOTE: It is reproducible pretty easy for me on desktop-desktop as well - you need to be offline as the receiver, I'd expect that they marked as a "Sent" but not "Sending failed".
It is reproducible pretty easy for me on desktop-desktop as well - you need to be offline as the receiver, I'd expect that they marked as a "Sent" but not "Sending failed".
I'll try to reproduce it with this approach, and hopefully see anything interesting in the logs!
well actually seems it is my setup? story node? - but 2 ppl from Poland / Ukraine are not facing this issue.
Bug Report
Problem
Messages are marked as "not sent" and are sending for 3-4 mins, at least 10% of them are marked as not sent.
Expected behavior
messages are sent instantly
Actual behavior
some stuck in "not sent" and are sending for 3-4 mins
IMPORTANT NOTE: Receiver received messages in banch after getting from background in 30-40 sec, video from receiver: IOS: https://github.com/user-attachments/assets/c5276e46-ba73-4a70-9f35-889232ffa6d5 IOS logs: logs (18).zip
Also this bug is reproducible because the input is available for receiver once they received contact request, and for sender there is a delay in accepting.
And when receiver comes from background, messages are marked as "Received" on desktop side as well.
Reproduction
https://github.com/user-attachments/assets/6ae1e524-6907-4472-b70d-050fac0a4fc7
Log from desktop for this particular minute: geth12_37.log
Data: 30 peers / node is store01.ams.status.prod More logs/ video from previous hour. More video: https://drive.google.com/file/d/1wcd-Ya3-yAztm5B0Of216i1p_ribt5IU/view?usp=sharing Logs: https://drive.google.com/file/d/10z-cHiuDn_rt9ScL5sj_DGPv_tME9-GZ/view?usp=sharing
Additional Information