neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

Native projections and its adjacency matrix #282

Closed gb09cl closed 8 months ago

gb09cl commented 10 months ago

Hello everybody,

we like to find out more about native projections and their application. Especially, how they can be apllied to problems regarding the execution times of reoccuring queries.

We asked us, how native projections work in a general way and how performant they are in comparison to a smarter data modeling approach. We were told by the neo4j support, that a highly optimised adjacency matrix is used for native projections. Sadly, there were not more information to that topic, but the employee provided us three links to go on.

The first was for this tutorial on GDS (https://www.youtube.com/watch?v=VJRasjO17B4&list=PL9Hl4pk2FsvVmKhfc1Lqo2n2qsX_Si4WY&index=56). We looked through the slides, but could not find details to our inquiry. In the slides it was described more like arrays, but not as an adjacency matrix. (Slides: https://www.slideshare.net/neo4j/scaling-into-billions-of-nodes-and-relationships-with-neo4j-graph-data-science)

The documentation of GDS (second link: https://neo4j.com/docs/graph-data-science/current/) did not help either but maybe the third link will help, which was to post our inquiry here =)

We are looking forward to hear from you Best regards

DarthMax commented 10 months ago

Hej there! thank you for your question.

In GDS we distinguish between the method of projection and the in-memory graph projection. The projection method defines how the graph is transformed from the source data into the in-memory projection. Currently this is mostly coupled with loading a graph from a Neo4j database where we support the native projection and cypher projections.

The native projection is the fastest method to project a (sub)-graph of the Neo4j database into a GDS in-memory graph. At the same time it is also the one that is the most restricted in it's capabilities. It mainly utilizes scanning the Neo4j stores (using low lever APIs) to quickly retrieve most of the data stored in the database.

Cypher projections actually utilize data provided by the Cypher query engine to construct the in-memory graph. Cypher projections can be less performant that native projections, but can utilize a lot of the data querying and data manipulation capabilities of Cypher.

From your question I assume that you are more interested in the details of the in-memory graph representation. Is that the case?

Best, Max