neo4j / neo4j-go-driver

Neo4j Bolt Driver for Go
Apache License 2.0
482 stars 68 forks source link

driver getting panicked with index out of range #574

Open jkoleti-uptycs opened 3 months ago

jkoleti-uptycs commented 3 months ago
panic: runtime error: index out of range [1] with length 0

goroutine 199034 [running]:
encoding/binary.bigEndian.PutUint16(...)
    /usr/local/go/src/encoding/binary/binary.go:144
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80?, 0xc00200c928})
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:72 +0x39e
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80, 0xc00200c928})
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:260 +0x55
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).send(...)
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:134
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close(0xc0032aad00, {0x2b65db0, 0x4866120})
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:925 +0x1a5
created by github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan in goroutine 199032
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x465

Our Database setup

We are using the neo4j go driver for connecting to our memgraph database with default config. One of our applications is getting panicked while running cypher queries. So built our application with the -race flag enabled and below is the stack trace for a panic.

==================
WARNING: DATA RACE
Write at 0x00c004441c50 by goroutine 198988:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).send()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:107 +0x2ca
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).send()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:260 +0x54
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).send()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:134 +0x1a4
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:925 +0x125
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).unregUnlocked.func1.1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:391 +0x61

Previous write at 0x00c004441c50 by goroutine 198992:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).beginMessage()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:47 +0x1ec
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).begin()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:45 +0xd6
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).appendGoodbye()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:244 +0xd5
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).appendGoodbye()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:129 +0x124
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:924 +0x108
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x61

Goroutine 198988 (running) created at:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).unregUnlocked.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:391 +0x104
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:477 +0x30
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).unreg()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:385 +0x18f
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).Return()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:460 +0x4e4
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).executeTransactionFunction.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:441 +0x85
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:477 +0x30
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).runRetriable()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:416 +0x5ed
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).ExecuteWrite()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:365 +0x86
  github.com/neo4j/neo4j-go-driver/v5/neo4j.SessionWithContext.ExecuteWrite-fm()
      <autogenerated>:1 +0xa7
  github.com/prahaladd/gograph/neo.(*Neo4jConnection).ExecuteQuery()
      /root/go/pkg/mod/github.com/prahaladd/gograph@v0.2.0/neo/executor.go:139 +0x536
  ...
  ...
  ...

Goroutine 198992 (running) created at:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x464
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).CleanUp()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:146 +0x313
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).Close.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:655 +0x77
==================
panic: runtime error: index out of range [1] with length 0

goroutine 199034 [running]:
encoding/binary.bigEndian.PutUint16(...)
    /usr/local/go/src/encoding/binary/binary.go:144
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80?, 0xc00200c928})
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:72 +0x39e
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80, 0xc00200c928})
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:260 +0x55
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).send(...)
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:134
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close(0xc0032aad00, {0x2b65db0, 0x4866120})
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:925 +0x1a5
created by github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan in goroutine 199032
    /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x465

This issue is happening randomly, the occurrence of this issue is not predictable.

Saw a similar issue that was closed here - https://github.com/neo4j/neo4j-go-driver/issues/525

kdescoteaux-uptycs commented 3 months ago

Note that this race happens when the ExecuteWrite pool Return invokes unreg() (because connection is old?) and this launches a go routine to Close the connection which cannot even begin to run until after the connection put in the Idle list. When ExecuteWrite returns the session could then be closed by the caller, which launches a Cleanup, which will close the old connections in the Idle List in more go routines. It is not safe to leave the Close to run asynchronously from the main flow of ExecuteWrite unless there is a lock in Close to prevent two simultaneous Closes from running.

StephenCathcart commented 3 months ago

Hey there!

Thanks for sharing the stack trace and your observations, it's very helpful. I've bumped up the priority on this one because it looks like it might be tied to the previous issue https://github.com/neo4j/neo4j-go-driver/issues/525 and we now have a bit more info to work with. I'll keep you in the loop with any updates or breakthroughs.

aniketkdm commented 1 month ago

Is there any update on this. We are seeing the same / similar behavior.

StephenCathcart commented 1 month ago

Hi @aniketkdm, are you seeing this problem using a Neo4j server, and if so which version? I'm just asking as the original issue was around Memgraph.

aniketkdm commented 1 month ago

Hello @StephenCathcart, yes we are seeing same / similar problem. We are using Neo4j aura through GCP and when we are running heavy load (reads and writes) we get %!v(PANIC=Format method: runtime error: invalid memory address or nil pointer dereference) randomly. We have tried adding recovery functions in our code but the panics are not getting recovered in our function. We are using github.com/neo4j/neo4j-go-driver/v5 v5.18.0. Even with the recovery functions, we are not getting stacktrace for these panics.

justinwalz commented 2 weeks ago

To chime in, we're also seeing this issue

StephenCathcart commented 2 weeks ago

Thanks for the extra information @aniketkdm and @justinwalz that narrows things down quite a bit.