Improve generated queries for cyclic domains

AlexiosP commented 3 years ago

When calling the findAllById method of a repository, the last cypher query which retrieves all the nodes and their related entities is the below

MATCH (rootNodeIds) 
WHERE id(rootNodeIds) IN $rootNodeIds 
OPTIONAL MATCH ()-[relationshipIds]-() 
WHERE id(relationshipIds) IN $relationshipIds 
OPTIONAL MATCH (relatedNodeIds) 
WHERE id(relatedNodeIds) IN $relatedNodeIds 
WITH rootNodeIds AS n, collect(DISTINCT relationshipIds) AS __sr__, collect(DISTINCT relatedNodeIds) AS __srn__ 
RETURN n AS __sn__, __sr__, __srn__

This obviously can use a lot extra memory and results to a long execution time due to the fact that it first matches the root nodes, afterwards for each of related nodes it matches relationships, etc.

Compared to something like the below

MATCH (rootNodeIds) 
WHERE id(rootNodeIds) IN $rootNodeIds 
WITH collect(rootNodeIds) AS n
OPTIONAL MATCH ()-[relationshipIds]-() 
WHERE id(relationshipIds) IN $relationshipIds 
WITH n, collect(DISTINCT relationshipIds) AS __sr__
OPTIONAL MATCH (relatedNodeIds) 
WHERE id(relatedNodeIds) IN $relatedNodeIds 
WITH n, __sr__, collect(DISTINCT relatedNodeIds) AS __srn__ 
UNWIND n AS rootNodeIds
WITH rootNodeIds AS n, __sr__, __srn__
RETURN n AS __sn__, __sr__, __srn__

or due to the behavior of DISTINCT in neo4j, it could be written as below

MATCH (rootNodeIds) 
WHERE id(rootNodeIds) IN $rootNodeIds 
WITH collect(rootNodeIds) AS n
OPTIONAL MATCH ()-[relationshipIds]-() 
WHERE id(relationshipIds) IN $relationshipIds 
WITH n, collect(relationshipIds) AS __sr__
UNWIND __sr__ AS rel
WITH n, collect(DISTINCT rel) AS __sr__
OPTIONAL MATCH (relatedNodeIds) 
WHERE id(relatedNodeIds) IN $relatedNodeIds 
WITH n, __sr__, collect(relatedNodeIds) AS __srn__ 
UNWIND __srn__ AS relatedNodeIds
WITH n,  __sr__, collect(DISTINCT relatedNodeIds) AS __srn__ 
UNWIND n AS rootNodeIds
WITH rootNodeIds AS n, __sr__, __srn__
RETURN n AS __sn__, __sr__, __srn__

there's a significant improvement as can be seen below

plan (1)

SDN version that has been tested with: 6.1.4, 6.1.5, 6.2.0-M3 neo4j version: 4.2.3 enterprise

michael-simons commented 3 years ago

Thanks a ton for the input, we are gonna discuss this in the upcoming week. 👍

asevich commented 3 years ago

@michael-simons is there some quick and dirty solution we can apply now? We invested a lot of time to update to SDN 6.x and this problem really degrades our production performance. We can of course try to revert update, but we would like to avoid it. I would appreciate any hints

michael-simons commented 3 years ago

You can always run it as custom query via @Query.

meistermeier commented 3 years ago

Thanks for taking the time and reporting the comparison. I adapted your suggestion with a few modifications to our query generating mechanism and merged it into the 6.1.x and main branch.

spring-projects / spring-data-neo4j

Improve generated queries for cyclic domains #2387