ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.11k stars 3.01k forks source link

Panic in ipfs node #6802

Open sidenaio opened 4 years ago

sidenaio commented 4 years ago

Version information: go-ipfs version: b8ec598d5801

Description: Ipfs node crashes Stacktrace : image

Stebalien commented 4 years ago

Knowns:

The underlying connection must be one of:

That means either:

We've had the first class of bug before in go-secio but that shouldn't be the case here.


ridenaio commented 4 years ago
Stebalien commented 4 years ago

We are using go-ipfs as a library. github.com/ipfs/go-ipfs v0.4.22-0.20191119151441-b8ec598d5801

I see. It looks like you're using a commit on master from November. I recommend updating to the latest master if possible.


Yes, we are using auto relays on our private network

Got it. This could be a bug any of the stream multiplexers, or maybe secio?

@vyzo this looks like https://github.com/libp2p/go-msgio/issues/16.

sidenaio commented 4 years ago

The bug reproduced on the commit c9e8070082e2

Stebalien commented 4 years ago

@vyzo could you try to reproduce this?

ridenaio commented 4 years ago

Any steps here? Failed for me too

panic: runtime error: slice bounds out of range [:203] with capacity 4

goroutine 13461090 [running]:
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-libp2p-pnet@v0.1.0/psk_conn.go:43 +0x3e7
io.ReadAtLeast(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x4, 0x0, 0x0, 0x0)
    C:/Go/src/io/io.go:310 +0x194
io.ReadFull(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
    C:/Go/src/io/io.go:329 +0xb1
github.com/libp2p/go-msgio.ReadLen(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/num.go:27 +0x10e
github.com/libp2p/go-msgio.(*reader).nextMsgLen(0xc003534dc0, 0x0, 0x0, 0x0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:178 +0xed
github.com/libp2p/go-msgio.(*reader).NextMsgLen(0xc003534dc0, 0x0, 0x0, 0x0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:173 +0xcb
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc000bac0a0, 0xc0031a7830, 0xc, 0xc, 0x0, 0x0, 0x0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-libp2p-secio@v0.2.1/rw.go:163 +0x163
io.ReadAtLeast(0x6a49090, 0xc0016ae600, 0xc0031a7830, 0xc, 0xc, 0xc, 0x0, 0x0, 0x0)
    C:/Go/src/io/io.go:310 +0x194
io.ReadFull(0x6a49090, 0xc0016ae600, 0xc0031a7830, 0xc, 0xc, 0x0, 0x0, 0x0)
    C:/Go/src/io/io.go:329 +0xb1
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc0032328f0, 0x0, 0x0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:507 +0x15a
github.com/libp2p/go-yamux.(*Session).recv(0xc0032328f0)
    D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:486 +0x32
created by github.com/libp2p/go-yamux.newSession
    D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:129 +0x528
Stebalien commented 4 years ago

I've checked read functions in.

Given that we're not seeing random memory corruptions everywhere, just here, it still looks like the underlying connection is returning more bytes read than it should. I just don't know how.

Stebalien commented 4 years ago

What transports are you using? QUIC? Websocket? TLS?

sidenaio commented 4 years ago

It's our config:

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/5555",
      "/ip6/::/tcp/5555"
    ]
  },
  "Bootstrap": [
    "/ip4/127.0.0.1/tcp/40405/ipfs/QmW6sqH4X46qSAUL6Fy7ovX42kWD5wp2n1VYZ3jnRFJGHg"
  ],
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "FilestoreEnabled": true,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "PreferTLS": false,
    "QUIC": false,
    "ShardingEnabled": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoFetch": false,
    "PathPrefixes": [],
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmZpRbcmAEJUFUijNdFENoJDgfbnQ5mVu59cUKhJyU1SEv"
  },
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": "",
    "StrictSignatureVerification": false
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "40s",
      "HighWater": 50,
      "LowWater": 30,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": true,
    "DisableNatPortMap": false,
    "DisableRelay": false,
    "EnableAutoNATService": true,
    "EnableAutoRelay": true,
    "EnableRelayHop": true
  }
}
aschmahmann commented 4 years ago

@sidenaio you exposed your private key in your posted config file (I removed it for you and deleted the comment history). In the future running ipfs config show will strip it out for you, or you can do it manually.

I'd recommend rotating your identity. AFAIK the easiest way to do this is to just create a new repo and copy over the identity components.

sidenaio commented 4 years ago

@aschmahmann It's a test config generated for my previous comment.

Kubuxu commented 4 years ago

@Stebalien idea, add sanity check to libp2p-pnet, and print type of the Conn, if it fails.

Stebalien commented 4 years ago

Double checking: you're not putting these nodes on the public network, right? Putting a node with:

  1. AutoRelay enabled.
  2. RelayHop enabled (i.e., acting as a relay).
  3. A 30-40 connection limit.

Will not end will if you have a lot of nodes in your network.


Otherwise, @Kubuxu is right.

Try to reproduce this with githu.com/libp2p/go-libp2p-pnet@test/ipfs-panic. Instead of panicing, that should print an error and print a connection type.

sidenaio commented 4 years ago

@Stebalien, yes, this is a private network. We use a swarm key. We will try to reproduce with this branch. Why are 30-40 connections not enough? What would you recommend for a network with 1k, 5k, 10k nodes?

sidenaio commented 4 years ago

@Stebalien We caught the error

ERROR[02-03|23:37:36.344] error while writing to stream            id=QmYXBBqKeh9RtAkEcfYbGce5zRyCFttrsHBKK5TK4dSAyj err="expected to read <= 4 in private network adapter, read 201: *struct { *net.TCPConn; manet.maEndpoints }"
Stebalien commented 4 years ago

Ok, that's a TCP connection.

Stebalien commented 4 years ago

Ok, I thought this might be related to a concurrent write in go-multistream but, really, that seems very unlikely. I'd still try upgrading to the latest master (includes a fix for that) to see if it helps, but this likely isn't solved.

Stebalien commented 4 years ago

At this point, I have to assume that we're sharing the buffer somehow.

Stebalien commented 4 years ago

This is really looking like some weird memory corruption.

  1. We take a lock before reading the next message length.
  2. We use an array embedded in the reader struct (msgio.go). The reader struct is returned by pointer and hidden behind an interface so we shouldn't be overwriting anything, ever.

Therefore, I have no idea how we could be reusing the buffer.


@sidenaio Could you try updating to the latest version of github.com/libp2p/go-libp2p-pnet@test/ipfs-panic and try updating github.com/multiformats/go-multistream@v0.1.1? I want to check to see if we're seeing this issue when the connection returns an error.

Specifically, https://github.com/golang/go/issues/24727. I see you're both using windows.

ridenaio commented 4 years ago

After upgrading to github.com/multiformats/go-multistream@v0.1.1 error is not reproduced anymore.

ridenaio commented 4 years ago

@Stebalien so, we thought that issue was solved and removed a link to github.com/libp2p/go-libp2p-pnet@test/ipfs-panic on public release. The issue came back again on windows. The current multistream version is 0.1.1.

panic: runtime error: slice bounds out of range [:72] with capacity 4

goroutine 12706 [running]:
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc002986b00, 0xc002546750, 0x4, 0x4, 0x406983, 0xc0037b7680, 0x4)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-pnet@v0.2.0/psk_conn.go:43 +0x249
io.ReadAtLeast(0x46e73d0, 0xc002986b00, 0xc002546750, 0x4, 0x4, 0x4, 0x1c, 0x0, 0x1c)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:329
github.com/libp2p/go-msgio.ReadLen(0x46e73d0, 0xc002986b00, 0xc002546750, 0x4, 0x4, 0x432471, 0xbb24bc, 0xc002303c01)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/num.go:27 +0x87
github.com/libp2p/go-msgio.(*reader).nextMsgLen(0xc002546740, 0xc002303ce0, 0xc0030d0801, 0xc002303dc0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:178 +0x7f
github.com/libp2p/go-msgio.(*reader).NextMsgLen(0xc002546740, 0x0, 0x0, 0x0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:173 +0x91
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc00008f9a0, 0xc000d10700, 0xc, 0xc, 0x0, 0x0, 0x0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-secio@v0.2.1/rw.go:163 +0x108
io.ReadAtLeast(0x46e75d0, 0xc001b2be00, 0xc000d10700, 0xc, 0xc, 0xc, 0xc, 0x0, 0x0)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:329
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc002016620, 0x0, 0x0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:526 +0xf0
github.com/libp2p/go-yamux.(*Session).recv(0xc002016620)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:505 +0x32
created by github.com/libp2p/go-yamux.newSession
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:125 +0x38d
ridenaio commented 3 years ago

The problem continues to appear. We are still using a custom branch with error handling for go-libp2p-pnet to avoid panics. Any ideas?

goroutine 4149415 [running]:
runtime/debug.Stack(0x86, 0x0, 0x0)
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/runtime/debug/stack.go:24 +0xa4
runtime/debug.PrintStack()
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/runtime/debug/stack.go:16 +0x29
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc00f13c400, 0xc007f8e768, 0x2, 0x2, 0xc01f6b5b30, 0x40ba9c, 0x19279e0)
    C:/Users/travis/gopath/pkg/mod/github.com/idena-network/go-libp2p-pnet@v0.2.1-0.20200406075059-75d9ee9b85ed/psk_conn.go:47 +0x2ca
io.ReadAtLeast(0x5770d78, 0xc00f13c400, 0xc007f8e768, 0x2, 0x2, 0x2, 0x1000, 0x0, 0xc01f6b5ba8)
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/io/io.go:329
github.com/libp2p/go-libp2p-noise.(*secureSession).readNextInsecureMsgLen(0xc007f8e6e0, 0x0, 0xc01f6b5be8, 0x464bfe)
    C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-noise@v0.1.1/rw.go:134 +0x93
github.com/libp2p/go-libp2p-noise.(*secureSession).Read(0xc007f8e6e0, 0xc0217b4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-noise@v0.1.1/rw.go:53 +0x1fa
bufio.(*Reader).fill(0xc02179afc0)
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/bufio/bufio.go:100 +0x10a
bufio.(*Reader).ReadByte(0xc02179afc0, 0xc02179b0e0, 0xc01f6b5de0, 0xce4004)
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/bufio/bufio.go:252 +0x40
encoding/binary.ReadUvarint(0x1fdbdc0, 0xc02179afc0, 0xc01f6b5dc4, 0x2, 0x2)
    C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/encoding/binary/varint.go:110 +0x8c
github.com/libp2p/go-mplex.(*Multiplex).readNextHeader(0xc02165f000, 0xc01f6b5dc0, 0x2, 0x0, 0x0)
    C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:548 +0x46
github.com/libp2p/go-mplex.(*Multiplex).handleIncoming(0xc02165f000)
    C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:367 +0xe7
created by github.com/libp2p/go-mplex.NewMultiplex
    C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:105 +0x204
Stebalien commented 3 years ago

What's the error message printed before the stack trace?

Stebalien commented 3 years ago

That is, the patch now includes the connection's type and error message.

Stebalien commented 3 years ago

However, I have good news: we've ruled out:

Because we've swapped all of these out. Left over are:

  1. Some protocol at the top may be reading from multiple threads.
  2. Multistream.
Stebalien commented 3 years ago

@ridenaio could you also post go list -m all so I can double-check versions?

Stebalien commented 2 years ago

We've done a bit of digging here and believe it might be a bug in go's runtime. Specifically, in go's handling of cancellation in Windows AsyncIO may, if Windows does something unexpected, fail to update the "bytes read" length.

But we can't pin-point the issue or reproduce it reliably enough to debug.

github-actions[bot] commented 2 years ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

woss commented 1 year ago

i'm also getting this, reported here https://github.com/ipfs/kubo/issues/7194