neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
597 stars 157 forks source link

randomWalk ArrayIndexOutOfBoundsException #176

Closed Things-Z closed 2 years ago

Things-Z commented 2 years ago

Describe the bug

GDS is a great project and has been clicked on star, but when running randomWalk, the program timed out.

I think the main problem is when I rebuild the graph again after deleting the last graph The node ID generated by neo4j was extremely large for me, but the total number of nodes was very small. As a result, when I specified sourceNodes in randomWalk, the id(node) passed in would be extremely large, exceeding the number of nodes, resulting in the abnormal operation of randomWalk

node-total

node_id

cypher-command:

match (n:MetaAttackPatternNode)
with collect(n) as sourceNodes
call gds.beta.randomWalk.stream(
    "project1",
    {
        sourceNodes:sourceNodes
    })
yield nodeIds, path
return nodeIds, path

Error log:

2022-03-30 08:58:08.192+0000 INFO  [neo4j.BoltWorker-5 [bolt] [/127.0.0.1:64690] ] RandomWalk :: Start
2022-03-30 08:58:08.193+0000 INFO  [neo4j.BoltWorker-5 [bolt] [/127.0.0.1:64690] ] RandomWalk :: create walks :: Start
Exception in thread "Thread-25" java.lang.ArrayIndexOutOfBoundsException: Index 1828 out of bounds for length 65
    at org.neo4j.gds.core.utils.paged.HugeIntArray$SingleHugeIntArray.get(HugeIntArray.java:277)
    at org.neo4j.gds.core.huge.CompressedAdjacencyList.degree(CompressedAdjacencyList.java:121)
    at org.neo4j.gds.core.huge.HugeGraph.degree(HugeGraph.java:322)
    at org.neo4j.gds.traversal.RandomWalk$RandomWalkTask.run(RandomWalk.java:221)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
    at org.neo4j.internal.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:110)
    Suppressed: java.lang.ArrayIndexOutOfBoundsException: Index 1829 out of bounds for length 65
        ... 10 more
    Suppressed: java.lang.ArrayIndexOutOfBoundsException: Index 1831 out of bounds for length 65
        ... 10 more
    Suppressed: java.lang.ArrayIndexOutOfBoundsException: Index 1830 out of bounds for length 65
        ... 10 more

To Reproduce

GDS version: 2.0.0 Neo4j version: 4.4.4 Operating system: MacOS-Monterey-12.3(Inter)

Steps to reproduce the behavior:

  1. create graph
  2. detach delete graph
  3. create graph
  4. run randomWalk

Expected behavior

Ids and PATH should be returned normally

Additional context

Things-Z commented 2 years ago

After I look at the source code, the code I had a problem with should look like the following, assert index < size

This assertion compares index with size of array

However, the received index is nodeId

So when I have a very large nodeId, greater than the array.size of nodes created by graph.project, throw an exception

image image

So what do we need to fix if this happens? This seems to be about all sourceNodes bug, other algorithms that need to pass in sourceNodes parameters might have this problem as well

😅😅😅

Things-Z commented 2 years ago

I tried to build the debugger from source code, but the JAR I built myself didn't work

image

If you can solve this problem, thank you very much

Things-Z commented 2 years ago

I tried to build the debugger from source code, but the JAR I built myself didn't work

image

If you can solve this problem, thank you very much

I changed settings.gradlew to replace 4.3 with 4.4 and now it's compiled and ready to run. But about ArrayIndexOutOfBoundsException problem, I haven't think of a solution

DarthMax commented 2 years ago

Hej @RGDZ-GZU, thank you for submitting the issue. You where indeed correct that the node IDs configured as sourceNodeIds where not mapped into the internal id space. I have created a fix that should be merged within the coming days. The fix should be available in the 2.0.1 patch release.

Best, Max

Things-Z commented 2 years ago

I made a simple improvement to ListNodeSupplier, using graph.tomappedNodeId to convert to an internally mapped ID when returning nextNode, and it now works fine

This may be a feasible and simple temporary solution

image
DarthMax commented 2 years ago

Yes that is basically the fix that I developed. :)

andrew-delph commented 1 year ago

Getting the same issue when running this: MATCH (source:Person {userId: 'k6_auth_4'}), (target:Person{userId: 'k6_auth_5'} ) CALL gds.shortestPath.dijkstra.stream('myGraph', { sourceNode: id(source), targetNode: id(target), relationshipTypes: ['FRIENDS'], nodeLabels: ['Person'], logProgress: true }) YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path RETURN totalCost

On some selected nodes it is working though

Neo4jError: Failed to invoke procedure gds.shortestPath.dijkstra.stream: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 4

vnickolov commented 1 year ago

@andrew-delph this is unrelated to randomWalk can you please create a separate issue and share more details about the case when this is failing, like dataset, graph project queries. Thank you in advance.