neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

Shortest paths using multiple relationship types #184

Closed tomasonjo closed 2 years ago

tomasonjo commented 2 years ago

When a user has multiple relationship types in their shortest path, the engine doesn't recognize that all relationship types need to have the particular relationship weight.

Using GDS 2.0.1

Create graph:

CREATE (:Stop)-[:TRAM {duration:5}]->(:Stop)-[:WALK]->(:Stop)

Project a graph

CALL gds.graph.project('test', 'Stop', ['TRAM', 'WALK'], {relationshipProperties:['duration']})

Run Dijkstra shortest path:

MATCH (s:Stop),(t:Stop)
WHERE NOT EXISTS {()-->(s)} AND NOT EXISTS {(t)-->()}
CALL gds.shortestPath.dijkstra.stream('test',{sourceNode:s, targetNode: t, relationshipWeightProperty:'duration'})
YIELD nodeIds, costs
RETURN nodeIds, costs

Results

nodeIds costs
[0, 1, 2] [0.0, 5.0, NaN]

You can observer the NaN values. I think it would be preferred to omit users to be able to return shortest paths with NaN values for cost.

Mats-SX commented 2 years ago

If you add a defaultValue to your relationship projection, does that fix the issue?

CALL gds.graph.project(
  'test', 
  'Stop', 
  { 
    TRAM: { properties: { duration: { defaultValue: 0 }}}, 
    WALK: { properties: { duration: { defaultValue: 0 }}} 
  } 
)

In general, we like to not validate every property value during algorithm execution, since this inhibits performance. Of course, in case GDS is the source of the NaN we would have to check for it.

tomasonjo commented 2 years ago

Ok, I get it. Just a behavior I noticed, how you handle it is up to you :)

Closing for now

Mats-SX commented 2 years ago

@tomasonjo I appreciate very much that you created the issue! Maybe a default default value of 0 would be more useful than default default of NaN.

Mats-SX commented 2 years ago

@tomasonjo It seems as though this may be a bug anyway. We will investigate more thoroughly.

s1ck commented 2 years ago

I investigated it and came to the conclusion that Dijkstra behaves exactly as expected. The problem is - as Mats already pointed out - the default value for the WALK relationship. Using relationshipProperties in the project procedure, is shorthand for saying "all projected relationship types have that property and the default value is NaN`.

If the relationship projection would be written as

{ 
    TRAM: { properties: { duration: { defaultValue: 0 }}}, 
    WALK: { } 
} 

the in-memory graph would have no property assigned to the WALK relationship, which - when executing Dijkstra - would trigger the algorithm specific fallback value of 1.0.

Hope this clarifies things.

tomasonjo commented 2 years ago

the in-memory graph would have no property assigned to the WALK relationship, which - when executing Dijkstra - would trigger the algorithm specific fallback value of 1.0

Perhaps adding this line to the docs would make sense

s1ck commented 2 years ago

I agree. We're discussing to either make it more prominent in the docs or to change the behaviour of relationshipProperties to only make it available for relationship types that actually have that property. Wdyt?