Closed cbizon closed 1 year ago
Furthermore, this seems to intersect badly with the predicate stuff that is generated in the cypher. The cypher has a big block of types in the edge, and then where clauses to make sure that the directionality is correct. But if you take all of that out, then the query runs quickly again and the plan is good.
And you don't actually have to take all of that out - just the WHERE clause on the n00-n01 edge is enough. Taking out the predicates from the [] and leaving the WHERE doesn't actually help.
I can see some possible simplifications:
This is all implemented in the last release
Sending this trapi to robokop works very quickly:
But if the n01 category is changed to
"categories": ["biolink:BiologicalEntity"]
then this query takes forever.The difference in the cypher is that when there is a single category, the transpiler writes n01 as (
n01
:biolink:BiologicalEntity
) but when there are more than one, it instead makes n01 a NamedThing and puts the labels in a WHERE clause.For some reason, this makes a big difference in performance. When the label is in a WHERE clause, the query plan is what you would expect: n00-n01 and n02-n01 and then intersect on n01. When the label is on the node, then for some reason neo4j changes to going n00-n01 and then n01-n02 and then intersecting with n02.
If you change the slow query to use the WHERE version, then neo4j uses the better query plan and performance is fine.
Is this a generally true thing though? Not sure how to evaluate....