`OPTIONAL MATCH` is much faster than `MATCH` in some cases

YuanchengJiang commented 1 year ago

Neo4j version: 5.4.0 Operating system: Ubuntu 20.04 API/Driver: Cypher Dataset: Recommendations, https://github.com/neo4j-graph-examples/recommendations

Query 1: OPTIONAL MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 WITH * WHERE s3.year>1940 WITH * WHERE s1.year>1906 WITH * WHERE toInteger(s2.userId)>87 RETURN count(s1); Response Time: ready to start consuming query after 34 ms, results consumed after another 1368 ms

Query 2: MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 WITH * WHERE s3.year>1940 WITH * WHERE s1.year>1906 WITH * WHERE toInteger(s2.userId)>87 RETURN count(s1); Response Time: ready to start consuming query after 35 ms, results consumed after another 16550 ms

Profile (with OPTIONAL):

neo4j@neo4j> PROFILE OPTIONAL MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 WITH * WHERE s3.year>1940 WITH * WHERE s1.year>1906 WITH * WHERE toInteger(s2.userId)>87 RETURN count(s1);
+-----------+
| count(s1) |
+-----------+
| 560429    |
+-----------+

+--------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version | Planner | Runtime   | Time | DbHits  | Rows | Memory (Bytes) |
+--------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | ""      | "COST"  | "SLOTTED" | 1402 | 1677906 | 1    | 150239424      |
+--------------------------------------------------------------------------------------------------+

Planner COST

Runtime SLOTTED

Runtime version 5.4

+----------------------+----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| Operator             | Id | Details                                                                                     | Estimated Rows | Rows    | DB Hits | Memory (Bytes) | Page Cache Hits/Misses |
+----------------------+----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +ProduceResults      |  0 | `count(s1)`                                                                                 |              1 |       1 |       0 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +EagerAggregation    |  1 | count(s1) AS `count(s1)`                                                                    |              1 |       1 |       0 |             24 |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Filter              |  2 | toInteger(cache[s2.userId]) > $autoint_3 AND s1.year > $autoint_2 AND s3.year > $autoint_1  |            151 |  560429 | 1138040 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Optional            |  3 |                                                                                             |           5597 |  606198 |       0 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Filter              |  4 | not anon_3 = anon_1 AND not anon_3 = anon_0 AND not anon_2 = anon_1 AND not anon_2 = anon_0 |           5597 |  606198 |       0 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +NodeHashJoin        |  5 | s2, s0                                                                                      |           5827 | 1080687 |       0 |      121121736 |                    0/0 |
| |\                   +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +Filter            |  6 | not anon_3 = anon_2                                                                         |         150980 |  466507 |       0 |                |                    0/0 |
| | |                  +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +NodeHashJoin      |  7 | s1                                                                                          |         150980 |  466507 |       0 |       14558928 |                    0/0 |
| | |\                 +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | | +Expand(All)     |  8 | (s0)-[anon_2]-(s1)                                                                          |          45920 |   45917 |   64970 |                |                    0/0 |
| | | |                +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | | +NodeByLabelScan |  9 | s0:Person                                                                                   |          19047 |   19047 |   19048 |                |                    0/0 |
| | |                  +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +Filter            | 10 | s1:Movie                                                                                    |          30002 |   91782 |   91782 |                |                    0/0 |
| | |                  +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +Expand(All)       | 11 | (s2)-[anon_3]-(s1)                                                                          |          30002 |   91782 |   92790 |                |                    0/0 |
| | |                  +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +Filter            | 12 | toInteger(cache[s2.userId]) > $autoint_0                                                    |            201 |     617 |     671 |                |                    0/0 |
| | |                  +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +NodeByLabelScan   | 13 | s2:User                                                                                     |            671 |     671 |     672 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Filter              | 14 | not anon_1 = anon_0                                                                         |         149461 |  466507 |       0 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +NodeHashJoin        | 15 | s3                                                                                          |         150971 |  466507 |       0 |       14558928 |                    0/0 |
| |\                   +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +Expand(All)       | 16 | (s0)-[anon_1]->(s3)                                                                         |          45917 |   45917 |   64970 |                |                    0/0 |
| | |                  +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| | +NodeByLabelScan   | 17 | s0:Person                                                                                   |          19047 |   19047 |   19048 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Filter              | 18 | s3:Movie                                                                                    |          30002 |   91782 |   91782 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Expand(All)         | 19 | (s2)-[anon_0]-(s3)                                                                          |          30002 |   91782 |   92790 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +Filter              | 20 | toInteger(cache[s2.userId]) > $autoint_0                                                    |            201 |     617 |     671 |                |                    0/0 |
| |                    +----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+
| +NodeByLabelScan     | 21 | s2:User                                                                                     |            671 |     671 |     672 |                |                    0/0 |
+----------------------+----+---------------------------------------------------------------------------------------------+----------------+---------+---------+----------------+------------------------+

Total database accesses: 1677906, total allocated memory: 150239424

1 row
ready to start consuming query after 34 ms, results consumed after another 1368 ms

Profile (without OPTIONAL):

neo4j@neo4j> PROFILE MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 WITH * WHERE s3.year>1940 WITH * WHERE s1.year>1906 WITH * WHERE toInteger(s2.userId)>87 RETURN count(s1);
+-----------+
| count(s1) |
+-----------+
| 560429    |
+-----------+

+-----------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version | Planner | Runtime   | Time  | DbHits    | Rows | Memory (Bytes) |
+-----------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | ""      | "COST"  | "SLOTTED" | 16585 | 166870401 | 1    | 18291544       |
+-----------------------------------------------------------------------------------------------------+

Planner COST

Runtime SLOTTED

Runtime version 5.4

+-----------------------+----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| Operator              | Id | Details                                                                                              | Estimated Rows | Rows    | DB Hits   | Memory (Bytes) | Page Cache Hits/Misses |
+-----------------------+----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +ProduceResults       |  0 | `count(s1)`                                                                                          |              1 |       1 |         0 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +EagerAggregation     |  1 | count(s1) AS `count(s1)`                                                                             |              1 |       1 |         0 |             24 |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Filter               |  2 | not anon_3 = anon_0 AND not anon_2 = anon_0 AND not anon_1 = anon_0                                  |              1 |  560429 |         0 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Expand(Into)         |  3 | (s2)-[anon_0]-(s3)                                                                                   |              2 |  567703 | 143889446 |       18291456 |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Filter               |  4 | s2:User AND toInteger(cache[s2.userId]) > $autoint_0 AND toInteger(cache[s2.userId]) > $autoint_3 AN |             93 | 4739583 |  13283762 |                |                    0/0 |
| |                     |    | D not anon_3 = anon_1 AND not anon_3 = anon_2                                                        |                |         |           |                |                        |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Expand(All)          |  5 | (s1)-[anon_3]-(s2)                                                                                   |           1751 | 7822540 |   8286288 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Filter               |  6 | s1.year > $autoint_2 AND not anon_2 = anon_1 AND s1:Movie                                            |             96 |  317249 |    679620 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Expand(All)          |  7 | (s0)-[anon_2]-(s1)                                                                                   |           3278 |  362109 |    407093 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Filter               |  8 | s0:Person                                                                                            |           1360 |   44598 |    143028 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +Expand(All)          |  9 | (s3)<-[anon_1]-(s0)                                                                                  |           4321 |  143028 |    172302 |                |                    0/0 |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+
| +NodeIndexSeekByRange | 10 | RANGE INDEX s3:Movie(year) WHERE year > $autoint_1                                                   |            270 |    8861 |      8862 |                |                    0/0 |
+-----------------------+----+------------------------------------------------------------------------------------------------------+----------------+---------+-----------+----------------+------------------------+

Total database accesses: 166870401, total allocated memory: 18291544

1 row
ready to start consuming query after 35 ms, results consumed after another 16550 ms

It is also a bit confusing that with some WITH * Query 1 is much faster than the query below:

OPTIONAL MATCH (s2:User)--(s3:Movie)<--(s0:Person)--(s1:Movie)--(s2:User) WHERE toInteger(s2.userId)>54 AND s3.year>1940 AND s1.year>1906 AND toInteger(s2.userId)>87 RETURN count(s1);. Response Time: ready to start consuming query after 31 ms, results consumed after another 16447 ms

HannesSandberg commented 1 year ago

Thank you for the report! We will investigate and come back to you.

inmost-light commented 1 year ago

@YuanchengJiang

Thank you for reporting. I've investigated and concluded that while the slowdown is unfortunate, this is not a bug.

What makes the difference in these queries is that for the regular MATCH the planner is allowed to pull predicates from later WITH clauses into its WHERE part, as that doesn't affect the results. For OPTIONAL MATCH that is not allowed.

In your example the planner finds s3.year>1940 predicate and produces NodeIndexSeekByRange. Unfortunately estimated rows are very inaccurate and the final plan ends up being suboptimal.

For OPTIONAL MATCH however we can only consider toInteger(s2.userId)>54 while matching the pattern, other predicates will be applied later via a Filter. With fewer predicates to solve, estimated rows are a bit more accurate and the final plan ends up being better.

To understand why it makes a difference whether you put predicates into WHERE of an OPTIONAL MATCH or in a WITH after the match, consider the following example:

Given an empty graph, this query will produce a single row containing null:

OPTIONAL MATCH (n) 
WHERE n.prop > 123
RETURN n

This query however will not produce any rows at all:

OPTIONAL MATCH (n)
WITH n WHERE n.prop > 123
RETURN n

The first query tries to match a node (n) such that n.prop > 123. Not finding any, it outputs a row with n -> null. The second query tries to match a node (n). Not finding any, it outputs a row with n -> null. Then it applies n.prop > 123 filter to this row, which filters it out and we end up with no rows produced in the end.

Hope this clears things up.

YuanchengJiang commented 1 year ago

@inmost-light Really appreciate your detailed explanation.

neo4j / neo4j

`OPTIONAL MATCH` is much faster than `MATCH` in some cases #13033