waku-org / js-waku

JavaScript implementation of Waku v2
https://js.waku.org
Apache License 2.0
168 stars 42 forks source link

bug: Filter subscriptions are not stable and missing messages #2139

Open weboko opened 1 month ago

weboko commented 1 month ago

This is a bug report

Problem

Full report: https://github.com/waku-org/js-waku/issues/2139#issuecomment-2399345230

Proposed Solutions

Notes

weboko commented 1 month ago

@danisharora099 to update description and share info about the problem and findings

danisharora099 commented 4 weeks ago

Further investigations lead to observation of:

waku:error:filter:v2 Error with receiving pipe +4s CodeError: stream reset
    at MplexStream.reset (index.js:25309:21)
    at MplexStreamMuxer._handleIncoming (index.js:25717:28)
    at MplexStreamMuxer.sink (index.js:25624:36)
    at async Promise.all (index 0)

It's possible that this is caused by Mplex not being able to handle the muxing; more about Yamux:

Yamux natively supports flow control, it is better suited for applications that require the transfer of large amounts of data. Until recently, the reason mplex was still supported was compatibility with js-libp2p, which didn’t have yamux support. Now that js-libp2p has gained yamux support, mplex should only be used to provide backward-compatibility with legacy nodes.

Mplex does not have backpressure, this means if you send more data than the other peer is able to receive (on one stream, cross stream still have TCP backpressure) the stream will Reset itself due to a buffer overflow

This could make sense considering it's in the beginning, Filter works well. It's only after it's received some data, does it start to error. And then we observe Stream Reset errors.

danisharora099 commented 1 week ago

Updates:

When it works, it works. There is definitely room for improvement in certain aspects, but for the times it doesn't work at all are the weird times. Re-summarising my finding from the last few days:

It might be fixed by https://github.com/waku-org/js-waku/pull/2137 Unfortunately not the case. I have been extensively testing the RC with mutex locks, and problems around failures with LightPush and Filter still exist quite a lot. Maybe better, but still a problem.

Image


Some interesting observation: I opened two browser instances simultaneously pushing sequences of messages with LP, and receiving with Filter, with a bunch of nodes in their local peer cache. It was seen that on the first window, it kept going. Filter worked. LightPush worked. (Tested until 3 sequences of 10 messages, and running) Not to say that it didn't miss a few messages. The other tab started to fail with LightPush on the 2nd sequence, and never recevered from it. Filter was still receiving messages sent by the other node (not self's as it wasn't able to push any). Seems like two problems at hand, or maybe a fusion into one:

Image Image

Few mins later:

Image

weboko commented 6 days ago

Reopening since issue is still present but might be fixed by https://github.com/waku-org/js-waku/issues/2158