neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

GDS on filtered graph data? #283

Closed aiden-huffman closed 10 months ago

aiden-huffman commented 10 months ago

Describe the bug Filtering before gds.graph.project doesn't work as expected

To Reproduce

CREATE (a:Location {name: 'A'}),
       (b:Location {name: 'B'}),
       (c:Location {name: 'C'}),
       (d:Location {name: 'D'}),
       (e:Location {name: 'E'}),
       (f:Location {name: 'F'}),
       (a)-[:ROAD {cost: 50, construction: True}]->(b),
       (a)-[:ROAD {cost: 50, construction: False}]->(c),
       (a)-[:ROAD {cost: 100, construction: False}]->(d),
       (b)-[:ROAD {cost: 40, construction: False}]->(d),
       (c)-[:ROAD {cost: 40, construction: False}]->(d),
       (c)-[:ROAD {cost: 80, construction: False}]->(e),
       (d)-[:ROAD {cost: 30, construction: False}]->(e),
       (d)-[:ROAD {cost: 80, construction: False}]->(f),
       (e)-[:ROAD {cost: 40, construction: False}]->(f);

We would like to avoid roads with construction when calculating shortest paths.

MATCH (a)-[e]->(b)
WHERE e.cost IS NOT NULL AND NOT e.construction
RETURN a, e, b

Despite the table not containing the edge with construction, the Graph visualisation does.

MATCH (a)-[e]->(b)
WHERE e.cost IS NOT NULL AND NOT e.construction
WITH gds.graph.project(
    'filteredGraph',
    a, b,
    {relationshipProperties: {cost: e.cost}}) AS g
RETURN g.graphName

The GDS projection also keeps the edge and will return the same result as the example here

GDS version: X.Y.Z Neo4j version: 5.11.0 Operating system: (Ubuntu 22.04)

Expected behavior Expect a projection of the filtered graph data, and the resulting calculation to be performed there.

FlorentinD commented 10 months ago

Hello, Thank you for the bug report! However, I am failing to reproduce your problem completly. The input graph consists of 6 nodes and 9 relationships.

First when visualizing the filted query in Neo4j browser, I could reproduce the issue. This is due to the Connect result nodes setting, which you want to uncheck in your case (https://neo4j.com/docs/browser-manual/current/operations/browser-settings/).

However, with your project query, the projected graph inside GDS correctly filters out the one relationship and results in 6 nodes and 8 relationships. You can also check with gds.graph.degree.stream('filteredGraph'). For the example you linked, could you clarify what you executed to verify your result?

aiden-huffman commented 10 months ago

Seems there was something wrong in the database on my end. We recently reset the database to reinitialize everything and the issue appears to be resolved. Thank you for taking a look.