neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
https://neo4j.com/labs/apoc
Apache License 2.0
1.7k stars 494 forks source link

apoc.periodic.iterate reports waiting time > 0 when only 1 user connected #1443

Closed Dcanzano closed 3 years ago

Dcanzano commented 4 years ago

Expected Behavior (Mandatory)

Was expecting query.log for a apoc.periodic.iterate involving a MERGE would report a waiting: time of 0 when I'm the only user connected to the database

Actual Behavior (Mandatory)

waiting time is reported >0 , for example

2020-03-10 19:15:55.705+0000 INFO  292911 ms: (planning: 52, waiting: 169405) - 1466 page hits, 5970 page faults - bolt-session bolt            neo4j-java/dev          client/127.0.0.1:49478 server/127.0.0.1:7687>   -
CALL apoc.periodic.iterate('MATCH (v:Vendor),(txn:TxnDetail) WHERE txn.vendorId = v.id  RETURN txn, v', 'MERGE (txn)-[:VENDOR]->(v)', {batchSize: 10000, parallel: false, iterateList: true, retries:3}); - {} - {}

How to Reproduce the Problem

Using Neo4j 3.5.14 / APOC apoc-3.5.0.6-all.jar

  1. start with empty graph
  2. populate data and indexes
foreach ( x in range (1,300000) | create (n:TxnDetail {id: x , vendorId: toInteger(rand()*500)}));
foreach ( x in range (300001,600000) | create (n:TxnDetail {id: x , vendorId: toInteger(rand()*500)}));
foreach ( x in range (600001,900000) | create (n:TxnDetail {id: x , vendorId: toInteger(rand()*500)}));
foreach ( x in range (900001,1200000) | create (n:TxnDetail {id: x , vendorId: toInteger(rand()*500)}));

foreach ( x in range (1,500) | create (n:Vendor {id:x}));
create index on :Vendor(id);
create index on :TxnDetail(vendorId);

call db.awaitIndexes(1200);
  1. run as a single cypher txn as
    neo4j> cypher runtime=slotted MATCH (v:Vendor), (txn:TxnDetail) WHERE txn.vendorId = v.id MERGE (txn)-[:VENDOR]->(v);
    0 rows available after 259265 ms, consumed after another 0 ms
    Created 1197613 relationships

query.log reports: note waiting=0, for example from the query.log

2020-03-10 17:52:38.816+0000 INFO  259247 ms: (planning: 456, waiting: 0) - 7551086 page hits, 5964 page faults - bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:49466 server/127.0.0.1:7687>   -
cypher runtime=slotted MATCH (v:Vendor), (txn:TxnDetail) WHERE txn.vendorId = v.id MERGE (txn)-[:VENDOR]->(v); - {} - {}
  1. stop Neo4j and clear out graph, restart Neo4j and rerun steps 1-2
  2. rather than run a single txn to create all relationships run
neo4j> CALL apoc.periodic.iterate('MATCH (v:Vendor),(txn:TxnDetail) WHERE txn.vendorId = v.id  RETURN txn, v', 'MERGE (txn)-[:VENDOR]->(v)', {batchSize: 10000, parallel: false, iterateList: true, retries:3});
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total   | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch                                               | operations                                                  | wasTerminated | failedParams |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 120     | 1197556 | 292       | 1197556             | 0                | 0             | 0       | {}            | {total: 120, committed: 120, failed: 0, errors: {}} | {total: 1197556, committed: 1197556, failed: 0, errors: {}} | FALSE         | {}           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row available after 292912 ms, consumed after another 8 ms

so 292k ms for APOC vs 259kms for a single txn but also query.log reports

2020-03-10 19:15:55.705+0000 INFO  292911 ms: (planning: 52, waiting: 169405) - 1466 page hits, 5970 page faults - bolt-session bolt            neo4j-java/dev          client/127.0.0.1:49478 server/127.0.0.1:7687>   -
CALL apoc.periodic.iterate('MATCH (v:Vendor),(txn:TxnDetail) WHERE txn.vendorId = v.id  RETURN txn, v', 'MERGE (txn)-[:VENDOR]->(v)', {batchSize: 10000, parallel: false, iterateList: true, retries:3}); - {} - {}

waiting? 169k msec? ??? but this is a local instance and no one else is connected but me? so is waiting reporting a correct number? waiting for what? as I'm the only one connected.

I also tried

CALL apoc.periodic.iterate('cypher runtime=slotted MATCH (v:Vendor) RETURN v', 'MATCH (txn:TxnDetail) WHERE txn.vendorId = v.id MERGE (txn)-[:VENDOR]->(v)', {batchSize: 10000, parallel: false, iterateList: true});
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch                                           | operations                                          | wasTerminated | failedParams |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1       | 500   | 310       | 500                 | 0                | 0             | 0       | {}            | {total: 1, committed: 1, failed: 0, errors: {}} | {total: 500, committed: 500, failed: 0, errors: {}} | FALSE         | {}           |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row available after 311148 ms, consumed after another 5 ms

which is even longer. and its query.log entry appears as

2020-03-10 20:09:12.907+0000 INFO  311148 ms: (planning: 56, waiting: 123) - 2 page hits, 0 page faults - bolt-session  bolt            neo4j-java/dev          client/127.0.0.1:49486server/127.0.0.1:7687>    -
CALL apoc.periodic.iterate('cypher runtime=slotted MATCH (v:Vendor) RETURN v', 'MATCH (txn:TxnDetail) WHERE txn.vendorId = v.id MERGE (txn)-[:VENDOR]->(v)', {batchSize: 10000, parallel: false, iterateList: true}); - {} - {}

which is really odd for although 'waiting:' is much smaller then the apoc.periodic.iterate first mentioned above, the 2 page hits ???

in the end I'm trying to determine the fastest way to create/merge a relationship from TxnDetail to Vendor where the vendorId is recorded in the TxnDetail

Specifications (Mandatory)

7G total RAM dbms.memory.heap.initial_size=3g dbms.memory.heap.max_size=3g dbms.memory.pagecache.size=2g

Currently used versions

Versions

conker84 commented 3 years ago

fixed by #1794