Closed frank-zsy closed 2 years ago
UPDATE: The error happened several times after that, and the error message is always Index xxxxx out of bounds for length 4096
and the xxxxxx
is always the last index of all the nodes projected.
Like if the selection node count is 933694, then the error message is Index 933693 out of bounds for length 4096
which maybe caused by incorrect ArrayBuffer initialization or expansion?
@frank-zsy thanks for reporting the issue and your findings. You're probably right about the off-by-one error. This bug should be only present in 1.7.x as we size the internal buffers differently in 1.8. Would you mind trying out the latest GDS release 1.8.2 ?
Do you require staying on 1.7.x?
@s1ck Thanks for your reply, I changed GDS to latest 1.8.2 release and the error still occurs with a different buffer size:
2022-01-18 13:45:49.419+0000 INFO [neo4j.BoltWorker-3 [bolt] [/127.0.0.1:53283] ] Loading :: Start
2022-01-18 13:45:49.420+0000 INFO [neo4j.BoltWorker-3 [bolt] [/127.0.0.1:53283] ] Loading :: Nodes :: Start
2022-01-18 13:45:49.497+0000 INFO [neo4j.BoltWorker-3 [bolt] [/127.0.0.1:53283] ] Loading :: Nodes 0%
2022-01-18 13:46:28.695+0000 WARN Graph creation failed
java.lang.ArrayIndexOutOfBoundsException: Index 933693 out of bounds for length 10000
at org.neo4j.gds.core.loading.NodesBatchBuffer.add(NodesBatchBuffer.java:102) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.loading.construction.NodesBuilder$ThreadLocalBuilder.addNode(NodesBuilder.java:314) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.loading.construction.NodesBuilder.addNode(NodesBuilder.java:153) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.loading.NodeRowVisitor.visit(NodeRowVisitor.java:84) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.cypher.internal.javacompat.ResultSubscriber.onRecordCompleted(ResultSubscriber.java:122) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProduceResultsPipe.produce(ProduceResultsPipe.scala:73) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProduceResultsPipe.$anonfun$internalCreateResults$2(ProduceResultsPipe.scala:45) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$3.next(ClosingIterator.scala:153) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.serveResults(PipeExecutionResult.scala:79) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.request(PipeExecutionResult.scala:66) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.result.StandardInternalExecutionResult.request(StandardInternalExecutionResult.scala:90) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.result.ClosingExecutionResult.request(ClosingExecutionResult.scala:144) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.javacompat.ResultSubscriber.fetchResults(ResultSubscriber.java:355) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.javacompat.ResultSubscriber.acceptFromSubscriber(ResultSubscriber.java:433) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.javacompat.ResultSubscriber.accept(ResultSubscriber.java:280) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.gds.core.loading.CypherNodeLoader.loadSingleBatch(CypherNodeLoader.java:86) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.loading.CypherRecordLoader.load(CypherRecordLoader.java:73) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.loading.CypherFactory.lambda$build$1(CypherFactory.java:152) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.transaction.TransactionContext.apply(TransactionContext.java:108) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.loading.CypherFactory.build(CypherFactory.java:135) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.core.GraphLoader.graphStore(GraphLoader.java:47) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.catalog.GraphCreateProc.createGraph(GraphCreateProc.java:277) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.catalog.GraphCreateProc.lambda$createCypher$1(GraphCreateProc.java:145) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.BaseProc.runWithExceptionLogging(BaseProc.java:186) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.gds.catalog.GraphCreateProc.createCypher(GraphCreateProc.java:143) ~[neo4j-graph-data-science-1.8.2.jar:?]
at org.neo4j.kernel.impl.proc.GeneratedProcedure_cypher1849534192360.apply(Unknown Source) ~[?:?]
at org.neo4j.procedure.impl.ProcedureRegistry.callProcedure(ProcedureRegistry.java:240) ~[neo4j-procedure-4.3.10.jar:4.3.10]
at org.neo4j.procedure.impl.GlobalProceduresRegistry.callProcedure(GlobalProceduresRegistry.java:364) ~[neo4j-procedure-4.3.10.jar:4.3.10]
at org.neo4j.kernel.impl.newapi.AllStoreHolder.callProcedure(AllStoreHolder.java:981) ~[neo4j-kernel-4.3.10.jar:4.3.10]
at org.neo4j.kernel.impl.newapi.AllStoreHolder.procedureCallRead(AllStoreHolder.java:905) ~[neo4j-kernel-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.$anonfun$callReadOnlyProcedure$1(CallSupport.scala:44) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callProcedure(CallSupport.scala:63) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callReadOnlyProcedure(CallSupport.scala:44) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.TransactionBoundQueryContext.callReadOnlyProcedure(TransactionBoundQueryContext.scala:1122) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.planning.ExceptionTranslatingQueryContext.callReadOnlyProcedure(ExceptionTranslatingQueryContext.scala:284) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.LazyReadOnlyCallMode.callProcedure(ProcedureCallMode.scala:52) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProcedureCallPipe.call(ProcedureCallPipe.scala:88) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.pipes.ProcedureCallPipe.$anonfun$internalCreateResultsByAppending$1(ProcedureCallPipe.scala:74) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$1.nextCur(ClosingIterator.scala:106) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$1.innerHasNext(ClosingIterator.scala:112) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.ClosingIterator.hasNext(ClosingIterator.scala:92) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.ClosingIterator$$anon$3.innerHasNext(ClosingIterator.scala:151) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.ClosingIterator.hasNext(ClosingIterator.scala:92) ~[neo4j-cypher-runtime-util-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.serveResults(PipeExecutionResult.scala:78) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.runtime.interpreted.PipeExecutionResult.request(PipeExecutionResult.scala:66) ~[neo4j-cypher-interpreted-runtime-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.result.StandardInternalExecutionResult.request(StandardInternalExecutionResult.scala:90) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.cypher.internal.result.ClosingExecutionResult.request(ClosingExecutionResult.scala:144) ~[neo4j-cypher-4.3.10.jar:4.3.10]
at org.neo4j.fabric.stream.QuerySubject$BasicQuerySubject$1.doRequest(QuerySubject.java:184) ~[neo4j-fabric-4.3.10.jar:4.3.10]
at org.neo4j.fabric.stream.QuerySubject$BasicQuerySubject$1.request(QuerySubject.java:167) ~[neo4j-fabric-4.3.10.jar:4.3.10]
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.request(Operators.java:2158) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at reactor.core.publisher.StrictSubscriber.request(StrictSubscriber.java:138) ~[reactor-core-3.4.11.jar:3.4.11]
at org.neo4j.fabric.stream.Rx2SyncStream$RecordSubscriber.request(Rx2SyncStream.java:129) ~[neo4j-fabric-4.3.10.jar:4.3.10]
at org.neo4j.fabric.stream.Rx2SyncStream.maybeRequest(Rx2SyncStream.java:91) ~[neo4j-fabric-4.3.10.jar:4.3.10]
at org.neo4j.fabric.stream.Rx2SyncStream.readRecord(Rx2SyncStream.java:50) ~[neo4j-fabric-4.3.10.jar:4.3.10]
at org.neo4j.fabric.bolt.BoltQueryExecutionImpl$QueryExecutionImpl.request(BoltQueryExecutionImpl.java:179) ~[neo4j-fabric-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.AbstractCypherAdapterStream.handleRecords(AbstractCypherAdapterStream.java:104) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.v3.messaging.ResultHandler.onPullRecords(ResultHandler.java:41) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.v4.messaging.PullResultConsumer.consume(PullResultConsumer.java:42) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine$State.consumeResult(TransactionStateMachine.java:514) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine$State$2.streamResult(TransactionStateMachine.java:358) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.statemachine.impl.TransactionStateMachine.streamResult(TransactionStateMachine.java:97) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.v4.runtime.InTransactionState.processStreamResultMessage(InTransactionState.java:76) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.v4.runtime.AbstractStreamingState.processUnsafe(AbstractStreamingState.java:49) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.v4.runtime.InTransactionState.processUnsafe(InTransactionState.java:63) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.v3.runtime.FailSafeBoltStateMachineState.process(FailSafeBoltStateMachineState.java:48) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.statemachine.impl.AbstractBoltStateMachine.nextState(AbstractBoltStateMachine.java:151) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.statemachine.impl.AbstractBoltStateMachine.process(AbstractBoltStateMachine.java:99) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.messaging.BoltRequestMessageReader.lambda$doRead$1(BoltRequestMessageReader.java:93) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.DefaultBoltConnection.lambda$enqueue$0(DefaultBoltConnection.java:156) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatchInternal(DefaultBoltConnection.java:252) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:187) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.DefaultBoltConnection.processNextBatch(DefaultBoltConnection.java:177) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.scheduling.ExecutorBoltScheduler.executeBatch(ExecutorBoltScheduler.java:257) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at org.neo4j.bolt.runtime.scheduling.ExecutorBoltScheduler.lambda$scheduleBatchOrHandleError$3(ExecutorBoltScheduler.java:240) ~[neo4j-bolt-4.3.10.jar:4.3.10]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.69.Final.jar:4.1.69.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
And the stack trace is almost the same.
Thanks for trying. Is there any chance you can provide us with some test data that reproduces the error?
I will try to export the data but the data size is quite large about 1 million nodes.
Since the original data set is really large, I export the nodes and relationships which cause the error and import to a new database, and everything runs correct on the new database to create a gds in memory graph. I have no idea why is this and I think I can not reproduce the error soon.
I will try to reproduce it and come back if there is any progress.
I am totally confused, since export and import data can fix the problem, I try to dump
and load
data in offline mode. But the error still occurs with the restored database. So I assume this maybe relate to some errors with database corruption.
We are also a bit puzzled about the behavior. Would it be possible to upload the dump somewhere, so we can investigate this?
Did you try running the consistency checker? See https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/ for how to do this.
@frank-zsy any updates on your side?
Here I come ~ I tried consistency checker and no error reported. The output is:
Selecting JVM - Version:11.0.1, Name:OpenJDK 64-Bit Server VM, Vendor:Oracle Corporation
2022-01-20 02:47:27.070+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:PageAlignedV4_3[AF4.3.0] record format from store /Users/frankzhao/workspace/Neo4j-Server-4.3.10/data/databases/neo4j
2022-01-20 02:47:27.071+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured for store /Users/frankzhao/workspace/Neo4j-Server-4.3.10/data/databases/neo4j. Selected format from the store files: RecordFormat:PageAlignedV4_3[AF4.3.0]
Index structure consistency check
.................... 10%
.................... 20%
.................... 30%
.................... 40%
.................... 50%
.................... 60%
.................... 70%
.................... 80%
.................... 90%
.................... 100%
Consistency check
.................... 10%
.................... 20%
.................... 30%
.................... 40%
.................... 50%
.................... 60%
.................... 70%
.................... 80%
.................... 90%
.................... 100%
And I upload the dump file to my online storage but since the original file is too large, I split the file into 5 files, you can download them here:
https://blog.frankzhao.cn/neo4j/neo4j.dump.1 https://blog.frankzhao.cn/neo4j/neo4j.dump.2 https://blog.frankzhao.cn/neo4j/neo4j.dump.3 https://blog.frankzhao.cn/neo4j/neo4j.dump.4 https://blog.frankzhao.cn/neo4j/neo4j.dump.5
And you can use cat neo4j.dump.* > neo4j.dump
and the load
the dump to a database.
Then the following query should cause the error:
CALL gds.graph.create.cypher(
'activityGraph',
'MATCH (n:Repo) WHERE n.activity_20174 IS NOT NULL RETURN id(n) AS id, 0.3 AS retentionFactor, COALESCE(n.open_rank_20173, 1) AS initValue UNION ALL MATCH (n:User) WHERE n.activity_20174 IS NOT NULL RETURN id(n) AS id, 0.5 AS retentionFactor, COALESCE(n.open_rank_20173, 1) AS initValue;',
'MATCH (u:User)-[a:ACTION]->(r:Repo) WHERE a.activity_20174 IS NOT NULL RETURN id(u) AS source,id(r) AS target, a.activity/u.activity AS weight UNION ALL MATCH (u:User)-[a:ACTION]->(r:Repo) WHERE a.activity_20174 IS NOT NULL RETURN id(r) AS source,id(u) AS target,a.activity/r.activity AS weight;'
)
YIELD graphName AS graph, nodeCount AS nodes, relationshipCount AS rels;
The error message in web browser is:
Failed to invoke procedure `gds.graph.create.cypher`: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 933693 out of bounds for length 4096
And please tell my if you downloaded the files and I will remove them from the public site.
@frank-zsy thanks for sharing, I downloaded the files, you can remove them.
Hey @frank-zsy Thanks again for sharing the input data. Using the dump, we could track down the problem to property conversion which lead to swallowed exceptions which in turn lead to not correctly flushing the internal buffers. The workaround is a fix in your node query: just turn COALESCE(n.open_rank_20173, 1)
into COALESCE(n.open_rank_20173, 1.0)
to indicate to the loading logic that initValue
is in fact a double
.
We use the property type of the first result row to initialize internal property array builders. And if that value is a long, we expect all other values to be longs, too. In some cases, those are doubles which lead to an internal exception. That exception has been swallowed and we only saw the aioob as a consequence when the query completed. We'll work on a fix, but the workaround should do the trick for now :)
Thanks a lot for your help! You save my day.
@frank-zsy we fixed this in https://github.com/neo4j/graph-data-science/commit/406cefd672f48ad6f6a23ede3b745d7db99ca93c
"Fixed" means, you see the actual root cause now, which is the incompatible type between the default in COALESCE
and the actual values in the rows.
It'll be contained in the upcoming 2.0 release.
Great, thanks~
Describe the bug
To Reproduce
GDS version: 1.7.3 Neo4j version: 4.3.10 Operating system: MacOS Catalina
Steps to reproduce the behavior:
It is very strange to have this error, here is the entire stacktraces log of the server:
And actually all works fine for 20151 to 201510 on creating in memory graph and further calculation, the error occurs on 201511 so it should be something wrong with certain dataset.
UPDATE: I skipped 201511 and all works fine for further data like 201512, 20161 and etc.
The whole graph node count is 38,592,541 and the relationship count is 69,963,915.
And also I inspect the create selection query, the project node count should be 611,068 and the project relationship count should be 1,398,383. And all query property on nodes and relationships have their own indexes.
The further query is a customized Pregel procedure by ourselves.
Expected behavior
Successfully create a in memory graph for further calculation.
Additional context