libp2p / go-libp2p

libp2p implementation in Go
MIT License
6.12k stars 1.08k forks source link

panic, possibly due to race condition #2291

Closed marten-seemann closed 1 year ago

marten-seemann commented 1 year ago

When running my Kubo node:

Daemon is ready
caught panic: runtime error: slice bounds out of range [73:30]
goroutine 1294031 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x64
github.com/libp2p/go-yamux/v4.(*Session).sendLoop.func1()
    github.com/libp2p/go-yamux/v4@v4.0.0/session.go:520 +0x40
panic({0x1068f1400, 0x14008afcc78})
    runtime/panic.go:884 +0x204
internal/poll.(*FD).Write(0x140043edc80, {0x14009ffb0a0, 0x1e, 0x20})
    internal/poll/fd_unix.go:383 +0x3ac
net.(*netFD).Write(0x140043edc80, {0x14009ffb0a0?, 0x14009c55be8?, 0x1400a7bfbc0?})
    net/fd_posix.go:96 +0x28
net.(*conn).Write(0x1400a106340, {0x14009ffb0a0?, 0x1400036c0c8?, 0x106603e00?})
    net/net.go:195 +0x34
github.com/libp2p/go-libp2p/p2p/security/noise.(*secureSession).writeMsgInsecure(...)
    github.com/libp2p/go-libp2p@v0.27.1-0.20230509174602-b2a0553074c9/p2p/security/noise/rw.go:154
github.com/libp2p/go-libp2p/p2p/security/noise.(*secureSession).Write(0x14006df0000, {0x140024bd7a0, 0xc, 0x10})
    github.com/libp2p/go-libp2p@v0.27.1-0.20230509174602-b2a0553074c9/p2p/security/noise/rw.go:122 +0x234
github.com/libp2p/go-yamux/v4.(*Session).sendLoop(0x140092fa600)
    github.com/libp2p/go-yamux/v4@v4.0.0/session.go:626 +0x56c
github.com/libp2p/go-yamux/v4.(*Session).send(0x6?)
    github.com/libp2p/go-yamux/v4@v4.0.0/session.go:512 +0x20
created by github.com/libp2p/go-yamux/v4.newSession
    github.com/libp2p/go-yamux/v4@v4.0.0/session.go:163 +0x568

I suspect this is due to a misuse of the buffer pool.

marten-seemann commented 1 year ago

Here's another one:


panic: runtime error: slice bounds out of range [86:68]

goroutine 1303221 [running]:
internal/poll.(*FD).Write(0xc008b48180, {0xc003fa8002, 0x44, 0x200})
    /Users/marten/bin/go1.20ex/src/internal/poll/fd_unix.go:383 +0x49c
net.(*netFD).Write(0xc008b48180, {0xc003fa8002, 0x44, 0x200})
    /Users/marten/bin/go1.20ex/src/net/fd_posix.go:96 +0x48
net.(*conn).Write(0xc004d8c138, {0xc003fa8002, 0x44, 0x200})
    /Users/marten/bin/go1.20ex/src/net/net.go:195 +0x88
net.dnsPacketRoundTrip({_, _}, _, {{{0x5f, 0x64, 0x6e, 0x73, 0x61, 0x64, 0x64, ...}, ...}, ...}, ...)
    /Users/marten/bin/go1.20ex/src/net/dnsclient_unix.go:102 +0x88
net.(*Resolver).exchange(_, {_, _}, {_, _}, {{{0x5f, 0x64, 0x6e, 0x73, 0x61, ...}, ...}, ...}, ...)
    /Users/marten/bin/go1.20ex/src/net/dnsclient_unix.go:187 +0x3ec
net.(*Resolver).tryOneName(_, {_, _}, _, {_, _}, _)
    /Users/marten/bin/go1.20ex/src/net/dnsclient_unix.go:277 +0x40c
net.(*Resolver).lookup(_, {_, _}, {_, _}, _, _)
    /Users/marten/bin/go1.20ex/src/net/dnsclient_unix.go:44 +0x2ac
net.(*Resolver).goLookupTXT(0xc00d80f238?, {0x106e65c58, 0xc009e91e50}, {0xc00b0a1020, 0x22})
    /Users/marten/bin/go1.20ex/src/net/lookup.go:847 +0x78
net.(*Resolver).lookupTXT(...)
    /Users/marten/bin/go1.20ex/src/net/lookup_unix.go:123
net.(*Resolver).LookupTXT(0xc0001bbe30?, {0x106e65c58, 0xc009e91e50}, {0xc00b0a1020, 0x22})
    /Users/marten/bin/go1.20ex/src/net/lookup.go:630 +0x4c
github.com/multiformats/go-multiaddr-dns.(*Resolver).Resolve(0xc0087632c0?, {0x106e65c58, 0xc009e91e50}, {0x106e7e880, 0xc008d58780})
    /Users/marten/src/go/pkg/mod/github.com/multiformats/go-multiaddr-dns@v0.3.1/resolve.go:195 +0x328
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).resolveAddrs(0xc0001fe000, {0x106e65c58, 0xc009e91e50}, {{0xc00c5d0cf0, 0x26}, {0xc00c1aaf60, 0x2, 0x2}})
    /Users/marten/src/go/src/github.com/libp2p/go-libp2p/p2p/net/swarm/swarm_dial.go:391 +0x3dc
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).addrsForDial(0xc0001fe000, {0x106e65c58, 0xc009e91e50}, {0xc00c5d0cf0, 0x26})
    /Users:325 +0x48c
github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialWorker).loop(0xc00573dd80)
    /Users/marten/src/go/src/github.com/libp2p/go-libp2p/p2p/net/swarm/dial_worker.go:97 +0x548
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialWorkerLoop(0xc0001fe000, {0xc00c5d0cf0, 0x26}, 0xc0099c2d20)
    /Users/marten/src/go/src/github.com/libp2p/go-libp2p/p2p/net/swarm/swarm_dial.go:299 +0x228
created by github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialSync).getActiveDial
    /Users/marten/src/go/src/github.com/libp2p/go-libp2p/p2p/net/swarm/dial_sync.go:83 +0x328
sukunrt commented 1 year ago
=== RUN   TestSingleCandidate
==================

WARNING: DATA RACE
Read at 0x00c0006bac70 by goroutine 306:
  github.com/quic-go/webtransport-go.(*sendStream).maybeSendStreamHeader()
      /Users/sukun/go/pkg/mod/github.com/quic-go/webtransport-go@v0.5.2/stream.go:58 +0x34
  github.com/quic-go/webtransport-go.(*sendStream).Write()
      /Users/sukun/go/pkg/mod/github.com/quic-go/webtransport-go@v0.5.2/stream.go:69 +0x34
  github.com/quic-go/webtransport-go.(*stream).Write()
      <autogenerated>:1 +0x58
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*webtransportStream).Write()
      <autogenerated>:1 +0x64
  github.com/libp2p/go-libp2p/p2p/security/noise.(*secureSession).writeMsgInsecure()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/rw.go:154 +0x104
  github.com/libp2p/go-libp2p/p2p/security/noise.(*secureSession).sendHandshakeMessage()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/handshake.go:186 +0xd8
  github.com/libp2p/go-libp2p/p2p/security/noise.(*secureSession).runHandshake()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/handshake.go:83 +0x478
  github.com/libp2p/go-libp2p/p2p/security/noise.newSecureSession.func1()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/session.go:70 +0x4c

Previous write at 0x00c0006bac70 by goroutine 209:
  github.com/quic-go/webtransport-go.(*sendStream).maybeSendStreamHeader()
      /Users/sukun/go/pkg/mod/github.com/quic-go/webtransport-go@v0.5.2/stream.go:64 +0x98
  github.com/quic-go/webtransport-go.(*sendStream).Close()
      /Users/sukun/go/pkg/mod/github.com/quic-go/webtransport-go@v0.5.2/stream.go:89 +0x28
  github.com/quic-go/webtransport-go.(*stream).Close()
      <autogenerated>:1 +0x40
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*webtransportStream).Close()
      <autogenerated>:1 +0x48
  github.com/libp2p/go-libp2p/p2p/security/noise.newSecureSession()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/session.go:84 +0x638
  github.com/libp2p/go-libp2p/p2p/security/noise.(*SessionTransport).SecureOutbound()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/session_transport.go:96 +0x104
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*transport).upgrade()
      /Users/sukun/projects/go-libp2p/p2p/transport/webtransport/transport.go:263 +0x484
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*transport).dialWithScope()
      /Users/sukun/projects/go-libp2p/p2p/transport/webtransport/transport.go:166 +0x358
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*transport).Dial()
      /Users/sukun/projects/go-libp2p/p2p/transport/webtransport/transport.go:130 +0x220
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialAddr()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/swarm_dial.go:509 +0x318
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialAddr-fm()
      <autogenerated>:1 +0x64
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).executeDial()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:219 +0x15c
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).addCheckFdLimit.func1()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:169 +0x40

Goroutine 306 (running) created at:
  github.com/libp2p/go-libp2p/p2p/security/noise.newSecureSession()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/session.go:69 +0x568
  github.com/libp2p/go-libp2p/p2p/security/noise.(*SessionTransport).SecureOutbound()
      /Users/sukun/projects/go-libp2p/p2p/security/noise/session_transport.go:96 +0x104
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*transport).upgrade()
      /Users/sukun/projects/go-libp2p/p2p/transport/webtransport/transport.go:263 +0x484
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*transport).dialWithScope()
      /Users/sukun/projects/go-libp2p/p2p/transport/webtransport/transport.go:166 +0x358
  github.com/libp2p/go-libp2p/p2p/transport/webtransport.(*transport).Dial()
      /Users/sukun/projects/go-libp2p/p2p/transport/webtransport/transport.go:130 +0x220
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialAddr()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/swarm_dial.go:509 +0x318
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialAddr-fm()
      <autogenerated>:1 +0x64
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).executeDial()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:219 +0x15c
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).addCheckFdLimit.func1()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:169 +0x40

Goroutine 209 (running) created at:
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).addCheckFdLimit()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:169 +0x628
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).addCheckPeerLimit()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:183 +0x494
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialLimiter).AddDialJob()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/limiter.go:194 +0x110
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).limitedDial()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/swarm_dial.go:481 +0x224
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialNextAddr()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/swarm_dial.go:417 +0xd0
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialWorker).loop()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/dial_worker.go:296 +0xe40
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialWorkerLoop()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/swarm_dial.go:298 +0x248
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialWorkerLoop-fm()
      <autogenerated>:1 +0x4c
  github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialSync).getActiveDial.func2()
      /Users/sukun/projects/go-libp2p/p2p/net/swarm/dial_sync.go:83 +0x60
==================
    testing.go:1446: race detected during execution of test
--- FAIL: TestSingleCandidate (0.33s)

One race condition failure which goes through noise secureOutbound

dennis-tra commented 1 year ago

I experienced a panic today which I wanted to report and came across this issue. May or may not be related to the existing stack traces. Here's mine stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1014e9480]

goroutine 63423 [running]:
github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialWorker).loop(0x140028eb680)
        /Users/dennistrautwein/.cache/go/pkg/mod/github.com/dennis-tra/go-libp2p@v0.0.0-20230512142639-41cfb265df11/p2p/net/swarm/dial_worker.go:181 +0x1440
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialWorkerLoop(0x140003e96c0, {0x14002ebd4d0, 0x26}, 0x14000812480)
        /Users/dennistrautwein/.cache/go/pkg/mod/github.com/dennis-tra/go-libp2p@v0.0.0-20230512142639-41cfb265df11/p2p/net/swarm/swarm_dial.go:299 +0x34
created by github.com/libp2p/go-libp2p/p2p/net/swarm.(*dialSync).getActiveDial
        /Users/dennistrautwein/.cache/go/pkg/mod/github.com/dennis-tra/go-libp2p@v0.0.0-20230512142639-41cfb265df11/p2p/net/swarm/dial_sync.go:83 +0x334

I'm using my fork which is based on v0.27.1. The changes consist of exporting a few struct member fields.

marten-seemann commented 1 year ago

Closing, since we haven't seen this again after fixing https://github.com/quic-go/webtransport-go/issues/77.

sukunrt commented 1 year ago

@dennis-tra the panic you reported was fixed in v0.27.5