Open gordom6 opened 4 years ago
I noticed this is slow when the full CSKG is loaded, it takes on the order of seconds to get "all nodes in ConceptNet" (an example queries), which suggests it's doing a full scan of the database.
@gordom6 Did some digging and I think the culprit is the getMatchingNodesCount
query.
For the "all nodes in ConceptNet" query
I did a quick measure of elapsed execution time
getMatchingNodes elapsed <1s
getMatchingNodesFacets elapsed <1s
getMatchingNodesCount elapsed 5.8s
the cypher query run by getMatchingNodesCount
is
MATCH (node: Node), (source0:Source { id: "CN" })
WHERE (node)-[:SOURCE]-(source0)
RETURN COUNT(node)
I also profiled the query which confirms your suspicion that "it's doing a full scan of the database"
I think that MATCH then WHERE as opposed to just using MATCH (as shown below) is the source of the slowdown
MATCH (node: Node), (source0:Source { id: "CN"}), (source0)-[:SOURCE]-(node)
RETURN COUNT(node)
Sounds reasonable. Can you fix it or would you like me to?
I will give it a shot
This is still taking ~15 seconds to load the example "nodes from WordNet" query.
Sources are currently (20200813) modeled as separate nodes in neo4j, with KgNode's connected via :SOURCE. Neo4j doesn't allow you to index relationships like that: https://community.neo4j.com/t/how-can-i-use-index-in-relationship/1627
Apparently Lucene indices are the way to go: https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-full-text-search/ Relationship indexes?
I'm also open to remodeling the way we handle sources. We moved to the current model because nodes can't have multi-valued properties.