neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

Misleading documentation about the performance of native vs cypher projection #154

Closed ppatino closed 2 years ago

ppatino commented 2 years ago

This is not an actual bug with the code, more of a "documentation bug" :). Documentation about graph creation provides a fairly cut and dry description of native vs cypher projection:

Native projections provide the best performance by reading from the Neo4j store files. Recommended for both the development, and the production phase

We have been using native projection on a large graph (order of 30 million nodes, hundreds of millions of relationships) and tracked down major performance issues w/ our neo4j server to calls to gds.graph.create. For example, the following projection by label/relationship type ends up projecting just 132 nodes / 1441 relationships, but takes 167 seconds to run in isolation with nothing else running on this particular server. During this time, disk read IO is ~50 to 100Mbps.

CALL gds.graph.create(
      'graph80446',
      'Network80446',
      'Linked_to',
      {relationshipProperties: 'weight'})

image

On the flip side, cypher projection is nearly instant (few hundred milliseconds), as my gut expectations would be with this simple of a query:

CALL gds.graph.create.cypher(
  'graph80446',
  'MATCH (n:Network80446) RETURN id(n) AS id',
  'MATCH (n:Network80446)-[r:Linked_to]->(m:Network80446) RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.weight AS weight')

This is a request to update documentation to provide more nuance around when to expect performance to be significantly better with native projection. I would also of course love to hear if it's completely unexpected for native projection to be slower in this case. Thanks!

s1ck commented 2 years ago

@ppatino Thanks for reporting this. Could you please share the output of gds.debug.sysInfo()? It is unexpected that the native projection takes that long as it's supposed to use the label index to read the nodes from the core db.

s1ck commented 2 years ago

@ppatino closing this for now, please re-open if it's still a problem