neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
628 stars 161 forks source link

ArrayIndexOutOfBoundsException during gds.graph.create.cypher #158

Closed frank-zsy closed 2 years ago

frank-zsy commented 2 years ago

Describe the bug

To Reproduce

GDS version: 1.7.3 Neo4j version: 4.3.10 Operating system: MacOS Catalina

Steps to reproduce the behavior:

It is very strange to have this error, here is the entire stacktraces log of the server:

2022-01-13 02:52:02.513+0000 INFO  [neo4j.BoltWorker-5 [bolt] [/127.0.0.1:50305] ] Loading :: Start
2022-01-13 02:52:02.515+0000 INFO  [neo4j.BoltWorker-5 [bolt] [/127.0.0.1:50305] ] Loading :: Nodes :: Start
2022-01-13 02:52:02.580+0000 INFO  [neo4j.BoltWorker-5 [bolt] [/127.0.0.1:50305] ] Loading :: Nodes 0%
2022-01-13 02:53:09.589+0000 WARN  Graph creation failed
java.lang.ArrayIndexOutOfBoundsException: Index 611067 out of bounds for length 4096
    at org.neo4j.gds.core.loading.NodesBatchBuffer.add(NodesBatchBuffer.java:88) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.loading.construction.NodesBuilder$ThreadLocalBuilder.addNode(NodesBuilder.java:310) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.loading.construction.NodesBuilder.addNode(NodesBuilder.java:152) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.loading.NodeRowVisitor.visit(NodeRowVisitor.java:78) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.onRecordCompleted(ResultSubscriber.java:122) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProduceResultsPipe.produce(ProduceResultsPipe.scala:73) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProduceResultsPipe.$anonfun$internalCreateResults$2(ProduceResultsPipe.scala:45) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$3.next(ClosingIterator.scala:153) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.serveResults(PipeExecutionResult.scala:79) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.request(PipeExecutionResult.scala:66) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.StandardInternalExecutionResult.request(StandardInternalExecutionResult.scala:90) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.ClosingExecutionResult.request(ClosingExecutionResult.scala:144) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.fetchResults(ResultSubscriber.java:355) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.acceptFromSubscriber(ResultSubscriber.java:433) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.accept(ResultSubscriber.java:280) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.gds.core.loading.CypherNodeLoader.loadSingleBatch(CypherNodeLoader.java:86) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.loading.CypherRecordLoader.load(CypherRecordLoader.java:73) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.loading.CypherFactory.lambda$build$1(CypherFactory.java:152) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.TransactionContext.apply(TransactionContext.java:108) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.loading.CypherFactory.build(CypherFactory.java:135) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.core.GraphLoader.graphStore(GraphLoader.java:47) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.catalog.GraphCreateProc.createGraph(GraphCreateProc.java:262) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.catalog.GraphCreateProc.lambda$createCypher$1(GraphCreateProc.java:139) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.BaseProc.runWithExceptionLogging(BaseProc.java:195) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.gds.catalog.GraphCreateProc.createCypher(GraphCreateProc.java:137) ~[neo4j-graph-data-science-1.7.3.jar:?]
    at org.neo4j.kernel.impl.proc.GeneratedProcedure_cypher994161291232946.apply(Unknown Source) ~[?:?]
    at org.neo4j.procedure.impl.ProcedureRegistry.callProcedure(ProcedureRegistry.java:240) ~[neo4j-procedure-4.3.10.jar:4.3.10]
    at org.neo4j.procedure.impl.GlobalProceduresRegistry.callProcedure(GlobalProceduresRegistry.java:364) ~[neo4j-procedure-4.3.10.jar:4.3.10]
    at org.neo4j.kernel.impl.newapi.AllStoreHolder.callProcedure(AllStoreHolder.java:981) ~[neo4j-kernel-4.3.10.jar:4.3.10]
    at org.neo4j.kernel.impl.newapi.AllStoreHolder.procedureCallRead(AllStoreHolder.java:905) ~[neo4j-kernel-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.$anonfun$callReadOnlyProcedure$1(CallSupport.scala:44) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callProcedure(CallSupport.scala:63) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callReadOnlyProcedure(CallSupport.scala:44) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.TransactionBoundQueryContext.callReadOnlyProcedure(TransactionBoundQueryContext.scala:1122) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.planning.ExceptionTranslatingQueryContext.callReadOnlyProcedure(ExceptionTranslatingQueryContext.scala:284) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.LazyReadOnlyCallMode.callProcedure(ProcedureCallMode.scala:52) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProcedureCallPipe.call(ProcedureCallPipe.scala:88) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProcedureCallPipe.$anonfun$internalCreateResultsByAppending$1(ProcedureCallPipe.scala:74) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$1.nextCur(ClosingIterator.scala:106) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$1.innerHasNext(ClosingIterator.scala:112) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator.hasNext(ClosingIterator.scala:92) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$3.innerHasNext(ClosingIterator.scala:151) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator.hasNext(ClosingIterator.scala:92) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.serveResults(PipeExecutionResult.scala:78) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.request(PipeExecutionResult.scala:66) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.StandardInternalExecutionResult.request(StandardInternalExecutionResult.scala:90) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.ClosingExecutionResult.request(ClosingExecutionResult.scala:144) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.QuerySubject$BasicQuerySubject$1.doRequest(QuerySubject.java:184) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.QuerySubject$BasicQuerySubject$1.request(QuerySubject.java:167) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.request(Operators.java:2158) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.StrictSubscriber.request(StrictSubscriber.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at org.neo4j.fabric.stream.Rx2SyncStream$RecordSubscriber.request(Rx2SyncStream.java:129) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.Rx2SyncStream.maybeRequest(Rx2SyncStream.java:91) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.Rx2SyncStream.readRecord(Rx2SyncStream.java:50) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.bolt.BoltQueryExecutionImpl$QueryExecutionImpl.request(BoltQueryExecutionImpl.java:179) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.AbstractCypherAdapterStream.handleRecords(AbstractCypherAdapterStream.java:104) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v3.messaging.ResultHandler.onPullRecords(ResultHandler.java:41) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.messaging.PullResultConsumer.consume(PullResultConsumer.java:42) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine$State.consumeResult(TransactionStateMachine.java:514) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine$State$1.streamResult(TransactionStateMachine.java:267) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine.streamResult(TransactionStateMachine.java:97) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.runtime.AutoCommitState.processStreamResultMessage(AutoCommitState.java:44) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.runtime.AbstractStreamingState.processUnsafe(AbstractStreamingState.java:49) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v3.runtime.FailSafeBoltStateMachineState.process(FailSafeBoltStateMachineState.java:48) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.AbstractBoltStateMachine.nextState(AbstractBoltStateMachine.java:151) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.AbstractBoltStateMachine.process(AbstractBoltStateMachine.java:99) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.messaging.BoltRequestMessageReader.lambda$doRead$1(BoltRequestMessageReader.java:93) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.lambda$enqueue$0(DefaultBoltConnection.java:156) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatchInternal(DefaultBoltConnection.java:252) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:187) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:177) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.scheduling.ExecutorBoltScheduler.executeBatch(ExecutorBoltScheduler.java:257) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.scheduling.ExecutorBoltScheduler.lambda$scheduleBatchOrHandleError$3(ExecutorBoltScheduler.java:240) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.69.Final.jar:4.1.69.Final]
    at java.lang.Thread.run(Thread.java:834) [?:?]
CALL gds.graph.create.cypher(
  'activityGraph201511',
  'MATCH (n) WHERE EXISTS(n.activity_201511) RETURN id(n) AS id, CASE WHEN n:Repo THEN 0.3 ELSE 0.5 END AS retentionFactor, COALESCE(n.open_rank_201510*0.5, 1) AS initValue;',
  'MATCH (u:User)-[a:ACTION]->(r:Repo) WHERE EXISTS(a.activity_201511) WITH [{s:id(u),t:id(r), w:a.activity_201511/u.activity_201511},{s:id(r),t:id(u), w:a.activity_201511/r.activity_201511}] AS results UNWIND results AS r RETURN r.s AS source, r.t AS target, r.w AS weight;'
)
YIELD graphName AS graph, nodeCount AS nodes, relationshipCount AS rels;

And actually all works fine for 20151 to 201510 on creating in memory graph and further calculation, the error occurs on 201511 so it should be something wrong with certain dataset.

UPDATE: I skipped 201511 and all works fine for further data like 201512, 20161 and etc.

The whole graph node count is 38,592,541 and the relationship count is 69,963,915.

And also I inspect the create selection query, the project node count should be 611,068 and the project relationship count should be 1,398,383. And all query property on nodes and relationships have their own indexes.

CALL xlab.pregel.openrank.write('${graphName}',{initValueProperty:'initValue',retentionFactorProperty:'retentionFactor',relationshipWeightProperty:'weight',tolerance:0.0001,maxIterations:40,writeProperty:'',suffix:'_201511'});

The further query is a customized Pregel procedure by ourselves.

Expected behavior

Successfully create a in memory graph for further calculation.

Additional context

frank-zsy commented 2 years ago

UPDATE: The error happened several times after that, and the error message is always Index xxxxx out of bounds for length 4096 and the xxxxxx is always the last index of all the nodes projected.

Like if the selection node count is 933694, then the error message is Index 933693 out of bounds for length 4096 which maybe caused by incorrect ArrayBuffer initialization or expansion?

s1ck commented 2 years ago

@frank-zsy thanks for reporting the issue and your findings. You're probably right about the off-by-one error. This bug should be only present in 1.7.x as we size the internal buffers differently in 1.8. Would you mind trying out the latest GDS release 1.8.2 ?

Do you require staying on 1.7.x?

frank-zsy commented 2 years ago

@s1ck Thanks for your reply, I changed GDS to latest 1.8.2 release and the error still occurs with a different buffer size:

2022-01-18 13:45:49.419+0000 INFO  [neo4j.BoltWorker-3 [bolt] [/127.0.0.1:53283] ] Loading :: Start
2022-01-18 13:45:49.420+0000 INFO  [neo4j.BoltWorker-3 [bolt] [/127.0.0.1:53283] ] Loading :: Nodes :: Start
2022-01-18 13:45:49.497+0000 INFO  [neo4j.BoltWorker-3 [bolt] [/127.0.0.1:53283] ] Loading :: Nodes 0%
2022-01-18 13:46:28.695+0000 WARN  Graph creation failed
java.lang.ArrayIndexOutOfBoundsException: Index 933693 out of bounds for length 10000
    at org.neo4j.gds.core.loading.NodesBatchBuffer.add(NodesBatchBuffer.java:102) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.loading.construction.NodesBuilder$ThreadLocalBuilder.addNode(NodesBuilder.java:314) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.loading.construction.NodesBuilder.addNode(NodesBuilder.java:153) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.loading.NodeRowVisitor.visit(NodeRowVisitor.java:84) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.onRecordCompleted(ResultSubscriber.java:122) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProduceResultsPipe.produce(ProduceResultsPipe.scala:73) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProduceResultsPipe.$anonfun$internalCreateResults$2(ProduceResultsPipe.scala:45) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$3.next(ClosingIterator.scala:153) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.serveResults(PipeExecutionResult.scala:79) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.request(PipeExecutionResult.scala:66) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.StandardInternalExecutionResult.request(StandardInternalExecutionResult.scala:90) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.ClosingExecutionResult.request(ClosingExecutionResult.scala:144) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.fetchResults(ResultSubscriber.java:355) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.acceptFromSubscriber(ResultSubscriber.java:433) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.javacompat.ResultSubscriber.accept(ResultSubscriber.java:280) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.gds.core.loading.CypherNodeLoader.loadSingleBatch(CypherNodeLoader.java:86) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.loading.CypherRecordLoader.load(CypherRecordLoader.java:73) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.loading.CypherFactory.lambda$build$1(CypherFactory.java:152) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.transaction.TransactionContext.apply(TransactionContext.java:108) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.loading.CypherFactory.build(CypherFactory.java:135) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.core.GraphLoader.graphStore(GraphLoader.java:47) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.catalog.GraphCreateProc.createGraph(GraphCreateProc.java:277) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.catalog.GraphCreateProc.lambda$createCypher$1(GraphCreateProc.java:145) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.BaseProc.runWithExceptionLogging(BaseProc.java:186) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.gds.catalog.GraphCreateProc.createCypher(GraphCreateProc.java:143) ~[neo4j-graph-data-science-1.8.2.jar:?]
    at org.neo4j.kernel.impl.proc.GeneratedProcedure_cypher1849534192360.apply(Unknown Source) ~[?:?]
    at org.neo4j.procedure.impl.ProcedureRegistry.callProcedure(ProcedureRegistry.java:240) ~[neo4j-procedure-4.3.10.jar:4.3.10]
    at org.neo4j.procedure.impl.GlobalProceduresRegistry.callProcedure(GlobalProceduresRegistry.java:364) ~[neo4j-procedure-4.3.10.jar:4.3.10]
    at org.neo4j.kernel.impl.newapi.AllStoreHolder.callProcedure(AllStoreHolder.java:981) ~[neo4j-kernel-4.3.10.jar:4.3.10]
    at org.neo4j.kernel.impl.newapi.AllStoreHolder.procedureCallRead(AllStoreHolder.java:905) ~[neo4j-kernel-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.$anonfun$callReadOnlyProcedure$1(CallSupport.scala:44) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callProcedure(CallSupport.scala:63) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callReadOnlyProcedure(CallSupport.scala:44) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.TransactionBoundQueryContext.callReadOnlyProcedure(TransactionBoundQueryContext.scala:1122) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.planning.ExceptionTranslatingQueryContext.callReadOnlyProcedure(ExceptionTranslatingQueryContext.scala:284) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.LazyReadOnlyCallMode.callProcedure(ProcedureCallMode.scala:52) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProcedureCallPipe.call(ProcedureCallPipe.scala:88) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProcedureCallPipe.$anonfun$internalCreateResultsByAppending$1(ProcedureCallPipe.scala:74) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$1.nextCur(ClosingIterator.scala:106) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$1.innerHasNext(ClosingIterator.scala:112) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator.hasNext(ClosingIterator.scala:92) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$3.innerHasNext(ClosingIterator.scala:151) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.ClosingIterator.hasNext(ClosingIterator.scala:92) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.serveResults(PipeExecutionResult.scala:78) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.request(PipeExecutionResult.scala:66) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.StandardInternalExecutionResult.request(StandardInternalExecutionResult.scala:90) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.cypher.internal.result.ClosingExecutionResult.request(ClosingExecutionResult.scala:144) ~[neo4j-cypher-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.QuerySubject$BasicQuerySubject$1.doRequest(QuerySubject.java:184) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.QuerySubject$BasicQuerySubject$1.request(QuerySubject.java:167) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.request(Operators.java:2158) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at reactor.core.publisher.StrictSubscriber.request(StrictSubscriber.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
    at org.neo4j.fabric.stream.Rx2SyncStream$RecordSubscriber.request(Rx2SyncStream.java:129) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.Rx2SyncStream.maybeRequest(Rx2SyncStream.java:91) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.stream.Rx2SyncStream.readRecord(Rx2SyncStream.java:50) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.fabric.bolt.BoltQueryExecutionImpl$QueryExecutionImpl.request(BoltQueryExecutionImpl.java:179) ~[neo4j-fabric-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.AbstractCypherAdapterStream.handleRecords(AbstractCypherAdapterStream.java:104) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v3.messaging.ResultHandler.onPullRecords(ResultHandler.java:41) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.messaging.PullResultConsumer.consume(PullResultConsumer.java:42) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine$State.consumeResult(TransactionStateMachine.java:514) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine$State$2.streamResult(TransactionStateMachine.java:358) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine.streamResult(TransactionStateMachine.java:97) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.runtime.InTransactionState.processStreamResultMessage(InTransactionState.java:76) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.runtime.AbstractStreamingState.processUnsafe(AbstractStreamingState.java:49) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v4.runtime.InTransactionState.processUnsafe(InTransactionState.java:63) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.v3.runtime.FailSafeBoltStateMachineState.process(FailSafeBoltStateMachineState.java:48) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.AbstractBoltStateMachine.nextState(AbstractBoltStateMachine.java:151) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.statemachine.impl.AbstractBoltStateMachine.process(AbstractBoltStateMachine.java:99) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.messaging.BoltRequestMessageReader.lambda$doRead$1(BoltRequestMessageReader.java:93) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.lambda$enqueue$0(DefaultBoltConnection.java:156) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatchInternal(DefaultBoltConnection.java:252) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:187) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:177) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.scheduling.ExecutorBoltScheduler.executeBatch(ExecutorBoltScheduler.java:257) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at org.neo4j.bolt.runtime.scheduling.ExecutorBoltScheduler.lambda$scheduleBatchOrHandleError$3(ExecutorBoltScheduler.java:240) ~[neo4j-bolt-4.3.10.jar:4.3.10]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.69.Final.jar:4.1.69.Final]
    at java.lang.Thread.run(Thread.java:834) [?:?]
frank-zsy commented 2 years ago

And the stack trace is almost the same.

s1ck commented 2 years ago

Thanks for trying. Is there any chance you can provide us with some test data that reproduces the error?

frank-zsy commented 2 years ago

I will try to export the data but the data size is quite large about 1 million nodes.

frank-zsy commented 2 years ago

Since the original data set is really large, I export the nodes and relationships which cause the error and import to a new database, and everything runs correct on the new database to create a gds in memory graph. I have no idea why is this and I think I can not reproduce the error soon.

I will try to reproduce it and come back if there is any progress.

frank-zsy commented 2 years ago

I am totally confused, since export and import data can fix the problem, I try to dump and load data in offline mode. But the error still occurs with the restored database. So I assume this maybe relate to some errors with database corruption.

s1ck commented 2 years ago

We are also a bit puzzled about the behavior. Would it be possible to upload the dump somewhere, so we can investigate this?

s1ck commented 2 years ago

Did you try running the consistency checker? See https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/ for how to do this.

s1ck commented 2 years ago

@frank-zsy any updates on your side?

frank-zsy commented 2 years ago

Here I come ~ I tried consistency checker and no error reported. The output is:

Selecting JVM - Version:11.0.1, Name:OpenJDK 64-Bit Server VM, Vendor:Oracle Corporation
2022-01-20 02:47:27.070+0000 INFO  [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:PageAlignedV4_3[AF4.3.0] record format from store /Users/frankzhao/workspace/Neo4j-Server-4.3.10/data/databases/neo4j
2022-01-20 02:47:27.071+0000 INFO  [o.n.k.i.s.f.RecordFormatSelector] Format not configured for store /Users/frankzhao/workspace/Neo4j-Server-4.3.10/data/databases/neo4j. Selected format from the store files: RecordFormat:PageAlignedV4_3[AF4.3.0]
Index structure consistency check
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
.................... 100%
Consistency check
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
.................... 100%
frank-zsy commented 2 years ago

And I upload the dump file to my online storage but since the original file is too large, I split the file into 5 files, you can download them here:

https://blog.frankzhao.cn/neo4j/neo4j.dump.1 https://blog.frankzhao.cn/neo4j/neo4j.dump.2 https://blog.frankzhao.cn/neo4j/neo4j.dump.3 https://blog.frankzhao.cn/neo4j/neo4j.dump.4 https://blog.frankzhao.cn/neo4j/neo4j.dump.5

And you can use cat neo4j.dump.* > neo4j.dump and the load the dump to a database.

Then the following query should cause the error:

CALL gds.graph.create.cypher(
  'activityGraph',
  'MATCH (n:Repo) WHERE n.activity_20174 IS NOT NULL RETURN id(n) AS id, 0.3 AS retentionFactor, COALESCE(n.open_rank_20173, 1) AS initValue UNION ALL MATCH (n:User) WHERE n.activity_20174 IS NOT NULL RETURN id(n) AS id, 0.5 AS retentionFactor, COALESCE(n.open_rank_20173, 1) AS initValue;',
  'MATCH (u:User)-[a:ACTION]->(r:Repo) WHERE a.activity_20174 IS NOT NULL RETURN id(u) AS source,id(r) AS target, a.activity/u.activity AS weight UNION ALL MATCH (u:User)-[a:ACTION]->(r:Repo) WHERE a.activity_20174 IS NOT NULL RETURN id(r) AS source,id(u) AS target,a.activity/r.activity AS weight;'
)
YIELD graphName AS graph, nodeCount AS nodes, relationshipCount AS rels;

The error message in web browser is:

Failed to invoke procedure `gds.graph.create.cypher`: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 933693 out of bounds for length 4096

And please tell my if you downloaded the files and I will remove them from the public site.

s1ck commented 2 years ago

@frank-zsy thanks for sharing, I downloaded the files, you can remove them.

s1ck commented 2 years ago

Hey @frank-zsy Thanks again for sharing the input data. Using the dump, we could track down the problem to property conversion which lead to swallowed exceptions which in turn lead to not correctly flushing the internal buffers. The workaround is a fix in your node query: just turn COALESCE(n.open_rank_20173, 1) into COALESCE(n.open_rank_20173, 1.0) to indicate to the loading logic that initValue is in fact a double.

We use the property type of the first result row to initialize internal property array builders. And if that value is a long, we expect all other values to be longs, too. In some cases, those are doubles which lead to an internal exception. That exception has been swallowed and we only saw the aioob as a consequence when the query completed. We'll work on a fix, but the workaround should do the trick for now :)

frank-zsy commented 2 years ago

Thanks a lot for your help! You save my day.

s1ck commented 2 years ago

@frank-zsy we fixed this in https://github.com/neo4j/graph-data-science/commit/406cefd672f48ad6f6a23ede3b745d7db99ca93c

"Fixed" means, you see the actual root cause now, which is the incompatible type between the default in COALESCE and the actual values in the rows.

It'll be contained in the upcoming 2.0 release.

frank-zsy commented 2 years ago

Great, thanks~