Closed YuanchengJiang closed 1 year ago
Thank you for the report! We will investigate and come back to you.
@YuanchengJiang
Thank you for reporting. I've investigated and concluded that while the slowdown is unfortunate, this is not a bug.
What makes the difference in these queries is that for the regular MATCH
the planner is allowed to pull predicates from later WITH
clauses into its WHERE
part, as that doesn't affect the results. For OPTIONAL MATCH
that is not allowed.
In your example the planner finds s3.year>1940
predicate and produces NodeIndexSeekByRange
. Unfortunately estimated rows are very inaccurate and the final plan ends up being suboptimal.
For OPTIONAL MATCH
however we can only consider toInteger(s2.userId)>54
while matching the pattern, other predicates will be applied later via a Filter
. With fewer predicates to solve, estimated rows are a bit more accurate and the final plan ends up being better.
To understand why it makes a difference whether you put predicates into WHERE
of an OPTIONAL MATCH
or in a WITH
after the match, consider the following example:
Given an empty graph, this query will produce a single row containing null
:
OPTIONAL MATCH (n)
WHERE n.prop > 123
RETURN n
This query however will not produce any rows at all:
OPTIONAL MATCH (n)
WITH n WHERE n.prop > 123
RETURN n
The first query tries to match a node (n)
such that n.prop > 123
. Not finding any, it outputs a row with n -> null
.
The second query tries to match a node (n)
. Not finding any, it outputs a row with n -> null
. Then it applies n.prop > 123
filter to this row, which filters it out and we end up with no rows produced in the end.
Hope this clears things up.
@inmost-light Really appreciate your detailed explanation.
Neo4j version: 5.4.0 Operating system: Ubuntu 20.04 API/Driver: Cypher Dataset: Recommendations, https://github.com/neo4j-graph-examples/recommendations
Query 1:
OPTIONAL MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 WITH * WHERE s3.year>1940 WITH * WHERE s1.year>1906 WITH * WHERE toInteger(s2.userId)>87 RETURN count(s1);
Response Time: ready to start consuming query after 34 ms, results consumed after another 1368 msQuery 2:
MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 WITH * WHERE s3.year>1940 WITH * WHERE s1.year>1906 WITH * WHERE toInteger(s2.userId)>87 RETURN count(s1);
Response Time: ready to start consuming query after 35 ms, results consumed after another 16550 msProfile (with OPTIONAL):
Profile (without OPTIONAL):
It is also a bit confusing that with some
WITH *
Query 1 is much faster than the query below:OPTIONAL MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 AND s3.year>1940 AND s1.year>1906 AND toInteger(s2.userId)>87 RETURN count(s1);
. Response Time: ready to start consuming query after 31 ms, results consumed after another 16447 ms