neo4j / neo4j-go-driver

Neo4j Bolt Driver for Go
Apache License 2.0
496 stars 70 forks source link

panic: runtime error: makeslice: len out of range #590

Closed mindaugasrukas closed 3 months ago

mindaugasrukas commented 4 months ago

We are getting panics with the message: runtime error: makeslice: len out of range. More details below.

Neo4j Version: 5 (neo4j Aura) Neo4j Mode: Single instance
Driver version: Go driver v5.22.0 Operating System: alpine:3.20 (Docker on K8s)

Steps to reproduce

Unknown yet

We assume the issue is related to the length of the path of a result. We don't see this issue on small paths.

CYPHER query example we execute:

MATCH (s1:Item {x_uuid: $x_uuid, eid: $eid1})
WITH s1
MATCH (s2:Item {x_uuid: $x_uuid, eid: $eid2})
WITH s1, s2
MATCH p = shortestPath((s1)-[r:NEXT*0..500]->(s2)),
    p1 = (before:Item)-[:NEXT*0..500]->(s1),
    p2 = (s2)-[:NEXT*0..500]->(after:Item),
    path = shortestPath((before)-[:NEXT*0..500]->(after))
WHERE
    NONE(n IN nodes(p1)[1..-1] WHERE n:Item)
    AND NONE(n IN nodes(p2)[1..-1] WHERE n:Item)
RETURN path

Explanation of the query:

  1. We have a graph of connected, call it, Items and NotItems. For example: (:Item)-[:NEXT]->(:NotItem)-[:NEXT]->(:NotItem)-[:NEXT]->(:Item)-[:NEXT]->(:Item)-[:NEXT]->(:NotItem)-[:NEXT]->(:NotItem)-[:NEXT]->(:Item)-...
  2. we find the shortest path between s1 and s2 Items.
  3. then we expand the path by adding the closest Items at the beginning and the end (if they exist).
  4. the final path = shortestPath(...) is probably not needed. This is just to make it easier to parse the results later. We could return RETURN p1, p, p2, but that makes it complicated to build the full end-to-end path.

So far, we observe that if a final path is somewhere more than 100 nodes long (not sure exactly), we get a panic. The largest of our paths might be around 1000 nodes long.

We have this index:

CREATE CONSTRAINT FOR (n:Item) REQUIRE (n.eid, n.x_uuid) IS UNIQUE;

Expected behavior

no error

Actual behavior

The client failed to run the query with a panic runtime error: makeslice: len out of range. The stacktrace is attached below.

Stack trace (I removed unrelated internal library names):

...handlers.PanicRecoveryMiddleware.func1.1.1
    ...go:801
runtime.gopanic
    /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/panic.go:770
runtime.panicmakeslicelen
    /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/slice.go:29
runtime.makeslice
    /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/slice.go:102
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).path
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:691
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).value
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:460
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).record
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:423
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).hydrate
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:160
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*incoming).next
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/incoming.go:40
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).receiveMsg
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/message_queue.go:208
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).receive
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/message_queue.go:152
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).receiveAll
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/message_queue.go:145
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt5).ForceReset
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/bolt5.go:807
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt5).Reset
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/bolt5.go:795
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).Return
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/pool/pool.go:410
github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).executeTransactionFunction.func2
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/session_with_context.go:467
runtime.gopanic
    /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/panic.go:770
runtime.panicmakeslicelen
    /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/slice.go:29
runtime.makeslice
    /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/slice.go:102
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).path
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:691
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).value
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:460
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).record
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:423
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*hydrator).hydrate
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/hydrator.go:160
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*incoming).next
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/incoming.go:40
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).receiveMsg
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/message_queue.go:208
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).receive
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/message_queue.go:152
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt5).Next
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/internal/bolt/bolt5.go:672
github.com/neo4j/neo4j-go-driver/v5/neo4j.(*resultWithContext).advance
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/result_with_context.go:246
github.com/neo4j/neo4j-go-driver/v5/neo4j.(*resultWithContext).Collect
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/result_with_context.go:153
...handlers.func1
    ...handler.go:149
github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).executeTransactionFunction
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/session_with_context.go:494
github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).runRetriable
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/session_with_context.go:434
github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).ExecuteRead
    /home/runner/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.22.0/neo4j/session_with_context.go:370
...
mindaugasrukas commented 4 months ago

To add more context: We don't see any issues running that query on the Aura query UI. Attached query profile SVG: neo4j_query_plan_2024-7-24

StephenCathcart commented 3 months ago

Thank you for raising the issue with such detail @mindaugasrukas. I will get back to you once we've investigated on our end.

StephenCathcart commented 3 months ago

Quick update for you:

We assume the issue is related to the length of the path of a result. We don't see this issue on small paths.

You're correct here, in hydrator.path we make a slice with its length dependent on the path, if this is too large (such as make([]int, math.MaxInt64)) the same error occurs, it's essentially an out-of-memory error.

Edit: I noticed your upper bounds are quite large (number of hops) at 500, we normally say anything over 10 is starting to become quite high. The following support page is a good read and has an example for explaining the magnitude of increasing the upper bounds from 4 -> 14: https://support.neo4j.com/s/article/12667263111955-Specifying-Upper-Bounds-On-Relationship-Pattern-Matching-in-Cypher-Queries.

mindaugasrukas commented 3 months ago

Thanks. Let me know how can I help to debug the issue. Some additional notes: we have multiple disconnected graphs in the database. When we hit smaller graphs, we have no issues. When we hit bigger ones, almost always, we get panicked. It seems like there is no relation with the result size but the size of the possible max length of the graph. Does that ring any bells?

mindaugasrukas commented 3 months ago

@StephenCathcart I created a reproducible example: https://github.com/StencilFrame/neo4j-go-driver-panic

mindaugasrukas commented 3 months ago

@StephenCathcart any updates?

mindaugasrukas commented 3 months ago

This issue is blocking our future development. Could you please provide us with an update on the status of this issue? If there is something we need to adjust on our end, we would appreciate guidance on how to proceed. Given that this matter affects our development process, we would like to know whether we should wait for a resolution or explore alternative solutions.

StephenCathcart commented 3 months ago

Hi @mindaugasrukas, thank you for providing the reproducible example, we were struggling to reproduce this on our end it's a massive help. We're just discussing the best way to tackle this issue. At first, I assumed that we were trying to create a slice too large (due to the size of the hops/path from your query) which would cause the error, however, it looks like we are trying to create a slice with a negative size which is unusual and a possible bug in our hydrator/unpacker. I will let you know as soon as we have a fix for this.

StephenCathcart commented 3 months ago

@mindaugasrukas we've made some progress on the panic, it looks like a bug in the hydrator when figuring out the length of the path, it's intermittent as the panic only happens if the size of the path falls between a particular range (for example 128-255). We're just testing the fix now, it will be included in the next release scheduled for the end of the month, but I'll see if we can get a patch release out earlier, I'll keep you posted.

mindaugasrukas commented 3 months ago

Thank you for your prompt response. I appreciate it.

StephenCathcart commented 3 months ago

Hi @mindaugasrukas, a patch has been released (v5.23.1) which includes a fix to the above issue. Please let me know if this fixes the problem you're seeing, thanks!

mindaugasrukas commented 3 months ago

Yes, that solved our issue. Thanks, @StephenCathcart.