Closed ppatino closed 2 years ago
@ppatino Thanks for reporting this. Could you please share the output of gds.debug.sysInfo()
? It is unexpected that the native projection takes that long as it's supposed to use the label index to read the nodes from the core db.
@ppatino closing this for now, please re-open if it's still a problem
This is not an actual bug with the code, more of a "documentation bug" :). Documentation about graph creation provides a fairly cut and dry description of native vs cypher projection:
We have been using native projection on a large graph (order of 30 million nodes, hundreds of millions of relationships) and tracked down major performance issues w/ our neo4j server to calls to gds.graph.create. For example, the following projection by label/relationship type ends up projecting just 132 nodes / 1441 relationships, but takes 167 seconds to run in isolation with nothing else running on this particular server. During this time, disk read IO is ~50 to 100Mbps.
On the flip side, cypher projection is nearly instant (few hundred milliseconds), as my gut expectations would be with this simple of a query:
This is a request to update documentation to provide more nuance around when to expect performance to be significantly better with native projection. I would also of course love to hear if it's completely unexpected for native projection to be slower in this case. Thanks!