Open sidenaio opened 4 years ago
Knowns:
The underlying connection must be one of:
That means either:
We've had the first class of bug before in go-secio but that shouldn't be the case here.
go list -m all
?github.com/ipfs/go-ipfs v0.4.22-0.20191119151441-b8ec598d5801
go list -m all
output hereWe are using go-ipfs as a library. github.com/ipfs/go-ipfs v0.4.22-0.20191119151441-b8ec598d5801
I see. It looks like you're using a commit on master from November. I recommend updating to the latest master if possible.
Yes, we are using auto relays on our private network
Got it. This could be a bug any of the stream multiplexers, or maybe secio?
@vyzo this looks like https://github.com/libp2p/go-msgio/issues/16.
The bug reproduced on the commit c9e8070082e2
@vyzo could you try to reproduce this?
Any steps here? Failed for me too
panic: runtime error: slice bounds out of range [:203] with capacity 4
goroutine 13461090 [running]:
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
D:/projects/go/pkg/mod/github.com/libp2p/go-libp2p-pnet@v0.1.0/psk_conn.go:43 +0x3e7
io.ReadAtLeast(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x4, 0x0, 0x0, 0x0)
C:/Go/src/io/io.go:310 +0x194
io.ReadFull(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
C:/Go/src/io/io.go:329 +0xb1
github.com/libp2p/go-msgio.ReadLen(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/num.go:27 +0x10e
github.com/libp2p/go-msgio.(*reader).nextMsgLen(0xc003534dc0, 0x0, 0x0, 0x0)
D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:178 +0xed
github.com/libp2p/go-msgio.(*reader).NextMsgLen(0xc003534dc0, 0x0, 0x0, 0x0)
D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:173 +0xcb
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc000bac0a0, 0xc0031a7830, 0xc, 0xc, 0x0, 0x0, 0x0)
D:/projects/go/pkg/mod/github.com/libp2p/go-libp2p-secio@v0.2.1/rw.go:163 +0x163
io.ReadAtLeast(0x6a49090, 0xc0016ae600, 0xc0031a7830, 0xc, 0xc, 0xc, 0x0, 0x0, 0x0)
C:/Go/src/io/io.go:310 +0x194
io.ReadFull(0x6a49090, 0xc0016ae600, 0xc0031a7830, 0xc, 0xc, 0x0, 0x0, 0x0)
C:/Go/src/io/io.go:329 +0xb1
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc0032328f0, 0x0, 0x0)
D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:507 +0x15a
github.com/libp2p/go-yamux.(*Session).recv(0xc0032328f0)
D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:486 +0x32
created by github.com/libp2p/go-yamux.newSession
D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:129 +0x528
I've checked read functions in.
Given that we're not seeing random memory corruptions everywhere, just here, it still looks like the underlying connection is returning more bytes read than it should. I just don't know how.
What transports are you using? QUIC? Websocket? TLS?
It's our config:
{
"API": {
"HTTPHeaders": {}
},
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5001",
"Announce": [],
"Gateway": "/ip4/127.0.0.1/tcp/8080",
"NoAnnounce": [],
"Swarm": [
"/ip4/0.0.0.0/tcp/5555",
"/ip6/::/tcp/5555"
]
},
"Bootstrap": [
"/ip4/127.0.0.1/tcp/40405/ipfs/QmW6sqH4X46qSAUL6Fy7ovX42kWD5wp2n1VYZ3jnRFJGHg"
],
"Datastore": {
"BloomFilterSize": 0,
"GCPeriod": "1h",
"HashOnRead": false,
"Spec": {
"mounts": [
{
"child": {
"path": "blocks",
"shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
"sync": true,
"type": "flatfs"
},
"mountpoint": "/blocks",
"prefix": "flatfs.datastore",
"type": "measure"
},
{
"child": {
"compression": "none",
"path": "datastore",
"type": "levelds"
},
"mountpoint": "/",
"prefix": "leveldb.datastore",
"type": "measure"
}
],
"type": "mount"
},
"StorageGCWatermark": 90,
"StorageMax": "10GB"
},
"Discovery": {
"MDNS": {
"Enabled": true,
"Interval": 10
}
},
"Experimental": {
"FilestoreEnabled": true,
"Libp2pStreamMounting": false,
"P2pHttpProxy": false,
"PreferTLS": false,
"QUIC": false,
"ShardingEnabled": false,
"StrategicProviding": false,
"UrlstoreEnabled": false
},
"Gateway": {
"APICommands": [],
"HTTPHeaders": {
"Access-Control-Allow-Headers": [
"X-Requested-With",
"Range",
"User-Agent"
],
"Access-Control-Allow-Methods": [
"GET"
],
"Access-Control-Allow-Origin": [
"*"
]
},
"NoFetch": false,
"PathPrefixes": [],
"RootRedirect": "",
"Writable": false
},
"Identity": {
"PeerID": "QmZpRbcmAEJUFUijNdFENoJDgfbnQ5mVu59cUKhJyU1SEv"
},
"Ipns": {
"RecordLifetime": "",
"RepublishPeriod": "",
"ResolveCacheSize": 128
},
"Mounts": {
"FuseAllowOther": false,
"IPFS": "/ipfs",
"IPNS": "/ipns"
},
"Plugins": {
"Plugins": null
},
"Provider": {
"Strategy": ""
},
"Pubsub": {
"DisableSigning": false,
"Router": "",
"StrictSignatureVerification": false
},
"Reprovider": {
"Interval": "12h",
"Strategy": "all"
},
"Routing": {
"Type": "dht"
},
"Swarm": {
"AddrFilters": null,
"ConnMgr": {
"GracePeriod": "40s",
"HighWater": 50,
"LowWater": 30,
"Type": "basic"
},
"DisableBandwidthMetrics": true,
"DisableNatPortMap": false,
"DisableRelay": false,
"EnableAutoNATService": true,
"EnableAutoRelay": true,
"EnableRelayHop": true
}
}
@sidenaio you exposed your private key in your posted config file (I removed it for you and deleted the comment history). In the future running ipfs config show
will strip it out for you, or you can do it manually.
I'd recommend rotating your identity. AFAIK the easiest way to do this is to just create a new repo and copy over the identity components.
@aschmahmann It's a test config generated for my previous comment.
@Stebalien idea, add sanity check to libp2p-pnet, and print type of the Conn, if it fails.
Double checking: you're not putting these nodes on the public network, right? Putting a node with:
Will not end will if you have a lot of nodes in your network.
Otherwise, @Kubuxu is right.
Try to reproduce this with githu.com/libp2p/go-libp2p-pnet@test/ipfs-panic
. Instead of panicing, that should print an error and print a connection type.
@Stebalien, yes, this is a private network. We use a swarm key. We will try to reproduce with this branch. Why are 30-40 connections not enough? What would you recommend for a network with 1k, 5k, 10k nodes?
@Stebalien We caught the error
ERROR[02-03|23:37:36.344] error while writing to stream id=QmYXBBqKeh9RtAkEcfYbGce5zRyCFttrsHBKK5TK4dSAyj err="expected to read <= 4 in private network adapter, read 201: *struct { *net.TCPConn; manet.maEndpoints }"
Ok, that's a TCP connection.
Ok, I thought this might be related to a concurrent write in go-multistream but, really, that seems very unlikely. I'd still try upgrading to the latest master (includes a fix for that) to see if it helps, but this likely isn't solved.
At this point, I have to assume that we're sharing the buffer somehow.
This is really looking like some weird memory corruption.
Therefore, I have no idea how we could be reusing the buffer.
@sidenaio Could you try updating to the latest version of github.com/libp2p/go-libp2p-pnet@test/ipfs-panic
and try updating github.com/multiformats/go-multistream@v0.1.1
? I want to check to see if we're seeing this issue when the connection returns an error.
Specifically, https://github.com/golang/go/issues/24727. I see you're both using windows.
After upgrading to github.com/multiformats/go-multistream@v0.1.1
error is not reproduced anymore.
@Stebalien so, we thought that issue was solved and removed a link to github.com/libp2p/go-libp2p-pnet@test/ipfs-panic
on public release. The issue came back again on windows. The current multistream version is 0.1.1
.
panic: runtime error: slice bounds out of range [:72] with capacity 4
goroutine 12706 [running]:
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc002986b00, 0xc002546750, 0x4, 0x4, 0x406983, 0xc0037b7680, 0x4)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-pnet@v0.2.0/psk_conn.go:43 +0x249
io.ReadAtLeast(0x46e73d0, 0xc002986b00, 0xc002546750, 0x4, 0x4, 0x4, 0x1c, 0x0, 0x1c)
C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:329
github.com/libp2p/go-msgio.ReadLen(0x46e73d0, 0xc002986b00, 0xc002546750, 0x4, 0x4, 0x432471, 0xbb24bc, 0xc002303c01)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/num.go:27 +0x87
github.com/libp2p/go-msgio.(*reader).nextMsgLen(0xc002546740, 0xc002303ce0, 0xc0030d0801, 0xc002303dc0)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:178 +0x7f
github.com/libp2p/go-msgio.(*reader).NextMsgLen(0xc002546740, 0x0, 0x0, 0x0)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:173 +0x91
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc00008f9a0, 0xc000d10700, 0xc, 0xc, 0x0, 0x0, 0x0)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-secio@v0.2.1/rw.go:163 +0x108
io.ReadAtLeast(0x46e75d0, 0xc001b2be00, 0xc000d10700, 0xc, 0xc, 0xc, 0xc, 0x0, 0x0)
C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:329
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc002016620, 0x0, 0x0)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:526 +0xf0
github.com/libp2p/go-yamux.(*Session).recv(0xc002016620)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:505 +0x32
created by github.com/libp2p/go-yamux.newSession
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:125 +0x38d
The problem continues to appear. We are still using a custom branch with error handling for go-libp2p-pnet to avoid panics. Any ideas?
goroutine 4149415 [running]:
runtime/debug.Stack(0x86, 0x0, 0x0)
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/runtime/debug/stack.go:24 +0xa4
runtime/debug.PrintStack()
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/runtime/debug/stack.go:16 +0x29
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc00f13c400, 0xc007f8e768, 0x2, 0x2, 0xc01f6b5b30, 0x40ba9c, 0x19279e0)
C:/Users/travis/gopath/pkg/mod/github.com/idena-network/go-libp2p-pnet@v0.2.1-0.20200406075059-75d9ee9b85ed/psk_conn.go:47 +0x2ca
io.ReadAtLeast(0x5770d78, 0xc00f13c400, 0xc007f8e768, 0x2, 0x2, 0x2, 0x1000, 0x0, 0xc01f6b5ba8)
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/io/io.go:329
github.com/libp2p/go-libp2p-noise.(*secureSession).readNextInsecureMsgLen(0xc007f8e6e0, 0x0, 0xc01f6b5be8, 0x464bfe)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-noise@v0.1.1/rw.go:134 +0x93
github.com/libp2p/go-libp2p-noise.(*secureSession).Read(0xc007f8e6e0, 0xc0217b4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-noise@v0.1.1/rw.go:53 +0x1fa
bufio.(*Reader).fill(0xc02179afc0)
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/bufio/bufio.go:100 +0x10a
bufio.(*Reader).ReadByte(0xc02179afc0, 0xc02179b0e0, 0xc01f6b5de0, 0xce4004)
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/bufio/bufio.go:252 +0x40
encoding/binary.ReadUvarint(0x1fdbdc0, 0xc02179afc0, 0xc01f6b5dc4, 0x2, 0x2)
C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/encoding/binary/varint.go:110 +0x8c
github.com/libp2p/go-mplex.(*Multiplex).readNextHeader(0xc02165f000, 0xc01f6b5dc0, 0x2, 0x0, 0x0)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:548 +0x46
github.com/libp2p/go-mplex.(*Multiplex).handleIncoming(0xc02165f000)
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:367 +0xe7
created by github.com/libp2p/go-mplex.NewMultiplex
C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:105 +0x204
What's the error message printed before the stack trace?
That is, the patch now includes the connection's type and error message.
However, I have good news: we've ruled out:
Because we've swapped all of these out. Left over are:
@ridenaio could you also post go list -m all
so I can double-check versions?
We've done a bit of digging here and believe it might be a bug in go's runtime. Specifically, in go's handling of cancellation in Windows AsyncIO may, if Windows does something unexpected, fail to update the "bytes read" length.
But we can't pin-point the issue or reproduce it reliably enough to debug.
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.
i'm also getting this, reported here https://github.com/ipfs/kubo/issues/7194
Version information: go-ipfs version: b8ec598d5801
Description: Ipfs node crashes Stacktrace :