Closed iulianpascalau closed 4 years ago
I'm asking if there is a slight possibility to encounter this problem in a setting where the reader R might have been used by multiple go routines - for example reading/writing on the same stream by multiple go routines? Thanks
I'm not sure if that's causing this issue but that's definitely going to cause problems if you're reading from multiple goroutines at the same time. msgio needs to atomically read the length and then the message.
However, this particular bug indicates that R is claiming to have read more data than it did. We had a bug in secio like this but that was fixed before 0.0.3.
What's the underlying reader?
What's the underlying reader?
I do not know what to answer to this. We'll try to update the libp2p libraries. Meanwhile, I can say that reading from a peer is done only on one go routine (same as in go-libp2p-pubsub) and we made writing concurrent safe by using a mutex for each peer.
What transport are you using?
TCP
Actually, looking at this a bit, we lock internally anyways so it shouldn't be an issue. I'm not sure what's going on here. It may be that the bug was exposed by a fix in 0.0.2 (we now truncate msg
on short reads). But the underlying reader must be reporting the wrong length.
Do you have relay enabled, by any chance?
Do you have relay enabled, by any chance?
Yes. It is enabled by default:
opts := []libp2p.Option{
libp2p.ListenAddrStrings(address),
libp2p.Identity(p2pPrivKey),
libp2p.DefaultMuxers,
libp2p.DefaultSecurity,
libp2p.ConnectionManager(conMgr),
libp2p.DefaultTransports,
}
Hm. So it could be a bug in any of our stream muxers as well. However, I can't find any obvious bugs. It could also be a memory corruption.
This is going to take a while to debug.
As stated, it is odd that only one of our early users encountered this. The machine is above average with i5, 16GB RAM and SSD running win10Pro. The other machines, around 138 running on AWS T3.medium with ubuntu never encountered this in the last 2 months. We will investigate further how to duplicate this error. Thanks.
Thanks! These kinds of bugs are honestly the most annoying for everyone involved. They hit when you don't expect and leave no trace.
Tell me about it! :)
@Stebalien is there any update on this?
No. I've never seen this issue in the wild so it's a bit tricky to debug.
Unless this is a data race (quite possible), I'm guessing:
I'm guessing the setup is:
yamux (a) -> secio (b) -> msgio (c) -> circuit-relay -> {mplex/yamux} (d) -> secio (e) -> msgio -> tcp
Why? Well, the bug is showing up in msgio (c) which is below yamux/secio (a/b). However,
the bug likely isn't in msgio (c) as msgio is just trusting the return value from io.ReadFull
. The bug could be in the TCP transport (assuming we're not using circuit-relay) but then it would likely be in the go standard library.
It's not in circuit-relay as circuit-relay just forwards reads. It could be in the second secio (e) but I'd expect a panic in (d) if that's where the bug is.
Therefore, I'm pretty sure the bug is in (d) (mplex/yamux). The next step is to dig through these to try to find out how they might claim to read more than the read buffer.
this sounds dangerously like the secio bug we fixed though, have we eliminated the possibility of an unpatched secio being involved?
It does but secio 0.1.0 is being used (which has this fixed): github.com/libp2p/go-libp2p-secio@v0.1.0
But, looking at the stream implementations for yamux and mplex, I can't see how it could be either.
Closing as this hasn't been reproduced in a while. Please leave a comment if you're using the latest version of go-libp2p and hit this bug again.
Hello, got this weird issue once in hundred hours of intensive libp2p usages on a single machine out of a couple hundreds:
The go-msgio version we use is v0.0.2
Looking at the commits after this version I have found out that some improvements were done (including in nextMsgLen) func. This panic should have not occurred even in v0.0.2. I'm asking if there is a slight possibility to encounter this problem in a setting where the reader R might have been used by multiple go routines - for example reading/writing on the same stream by multiple go routines? Thanks