Open thomaseizinger opened 1 year ago
Thanks for the report, was that happening with nim-libp2p as the dialer or receiver? We do have a strict dep on identify for now, it will be removed later, but should still be handled gracefully
EDIT: regarding point 3, we most of the time rely on multistream-select to know if a protocol is supported. Identify is just mandatory for legacy reasons, not that the values are actually required for libp2p
Thanks for the report, was that happening with nim-libp2p as the dialer or receiver?
As the dialer. Listener was working fine!
Turns out we do not have a hard dep on identify: https://github.com/status-im/nim-libp2p/blob/c45f9705ab292ac81d65feae03802d916c36e38d/libp2p/peerstore.nim#L206-L217 (read: if we can't negotiate the stream, just close it & keep on going)
Most likely, the actual issue is just a race condition which doesn't happen with identify enabled, because identify is delaying the connection upgrading process
Okay, thanks for debugging! Should identify be removed from the interop test then?
I would like to understand what is going on, but it seems no one has succeeded to reproduce the issue outside of CI at this point
I would like to understand what is going on
I think that would be good!
May I suggest the following:
This would unblock other implementations and allows you to iterate on the problem without any pressure. Perhaps long-term, we shouldn't even add it back given that we just want to test the ping protocol.
Removing identify on our side will not fix anything, the only reason why currently it works with identify is probably thanks to the extra round trip required for identify, which gives enough time to avoid the race condition
However, this test was working for months, and started to break when js-libp2p & rust-libp2p tweaked their Yamux implementation. While it's likely that the race condition is on our side, it's also possible that the recent changes broke something And wherever the bug is, it might cause issues in live networks
I apologize for the delay caused in rust & js PRs being merged, but we evidently at this point have no idea what is going on, and I'd rather stay on the safe side. I'm looking into this actively
Okay, I had assumed there was no race condition with identify disabled :)
There is no rush from our end, I am happy to merge the PR in rust-libp2p with identify added and remove it later once the race condition is resolved.
Let me know if you need any help debugging rust-yamux
.
Hi,
I did some work on the
rust-yamux
implementation lately and when trying to integrate it withrust-libp2p
, I ran into interop errors withnim-libp2p
: https://github.com/libp2p/rust-libp2p/pull/3013After some debugging and comparing of before and after outputs of our logs, I suspect that
nim-libp2p
has a hard dependency on the identify protocol. That is, if the other node does not speak the identify protocol,nim-libp2p
will refuse to open streams of other protocols too.In my testing, I observed that
nim-libp2p
would open a first stream with the identify protocol and in previous versions then follow-up with a stream for the ping protocol.There also seems to be a race condition here. My guess is that my work on the
rust-yamux
implementation reduced the latency and thus we answer the initialmultistream-select
handshake for the identify protocol faster than before.In summary, the behaviour seems to be:
Note that outbound pings from the
rust-libp2p
binary were working fine this entire time.For now, the problem is resolved by adding the identify protocol to our interop test binary. I do think that there are multiple bugs in
nim-libp2p
here:multistream-select
handshake will tell whether a protocol is actually supported.