neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
770 stars 195 forks source link

shortestPath for nodes that have multiple relationships in between #812

Open Joshua-Yu opened 5 years ago

Joshua-Yu commented 5 years ago

I have a transportation graph of city nodes linked by railways and roads as relationships. There are multiple links between 2 city nodes, e.g.: .

(city1) -[:ROAD{distance:100, cost:50}]-> (city2) (city1) -[:ROAD{distance:110, cost:60}]-> (city2) (city1) -[:RAILWAY{distance:95, cost:40}]-> (city2)

What I found with shortestPath in ALGO package was it seems to only look up one relationship type and didn't always pick up the shortest path:

MATCH (start:城市{name:'北京'}),(end:城市{name:'徐州'}) CALL algo.shortestPath.stream(start, end, 'cost' ,{nodeQuery: 'MATCH (n:城市) RETURN id(n) AS id' , relationshipQuery:'MATCH (n:城市) -[r:公路连接]- (m) WHERE id(m) <> id (n) AND r.line IN ["G1","G2"] RETURN id(n) AS source, id(m) AS target, r.cost AS weight' , direction:'BOTH' , graph:'cypher' }) YIELD nodeId, cost RETURN algo.getNodeById(nodeId).name as station,cost

Until I sorted r.cost (interestingly by descendent order):

MATCH (start:城市{name:'北京'}),(end:城市{name:'徐州'}) CALL algo.shortestPath.stream(start, end ,'cost' ,{nodeQuery: 'MATCH (n:城市) RETURN id(n) AS id' , relationshipQuery:'MATCH (n:城市) -[r:公路连接]- (m) WHERE id(m) <> id (n) AND r.line IN ["G1","G2"] RETURN id(n) AS source, id(m) AS target, r.cost AS weight ORDER BY id(n) ASC, id(m) ASC, r.cost DESC' , direction:'BOTH' , graph:'cypher' }) YIELD nodeId, cost RETURN algo.getNodeById(nodeId).name as station,cost

This happens to other shortest path related algorithms e.g. DeltaStepping too, which doesn't seem to be right. Did I miss anything here?

Cheers

Joshua

Smooshpie commented 5 years ago

Hey, I think the problem lies within the definition of those algorithms. Most of them are not designed to work on multi-graphs, for e.g. take Dijkstra's algorithm: If you look at the part were the algorithm iterates through all neighbors of the current node. It calculates the total weight per neighbor only once. Hence the implementation will only consider one relation per node pair, which is (as you already mentioned) not always the shortest path. I'm not quite sure in which order the GraphFactories load the relations between node pairs.

We stumbled upon the same issue using yen's algorithm (which uses dijkstra underneath) on a multi modal multi-graph similar to the one you mentioned. We solved the issue by translating our multi-graph into an equivalent simple graph by splitting a node into multiple ones per relation and reconnect the derived nodes with no costs.

Example: Let's take your example (city1) -[:ROAD{distance:100, cost:50}]-> (city2) (city1) -[:ROAD{distance:110, cost:60}]-> (city2) (city1) -[:RAILWAY{distance:95, cost:40}]-> (city2)

  1. split (city1Road1) -[:ROAD{distance:100, cost:50}]-> (city2Road1) (city1Road2) -[:ROAD{distance:110, cost:60}]-> (city2Road2) (city1Rail) -[:RAILWAY{distance:95, cost:40}]-> (city2Rail)

  2. reconnect with zero costs to allow change between modes of transportation at a certain city (city1Road1)-[{distance:0, cost:0}]-> (city1Road2) (city1Road1)-[{distance:0, cost:0}]-> (city1Rail)

(city1Road2)-[{distance:0, cost:0}]-> (city1Road1) (city1Road2)-[{distance:0, cost:0}]-> (city1Rail)

(city1Rail)-[{distance:0, cost:0}]-> (city1Road1) (city1Rail)-[{distance:0, cost:0}]-> (city1Road2)

Furthermore it is advised to split the nodes per city into arrival and departure nodes which are two disjoint, independent sets of A(rrivals) and D(departures) where only relations from A -> D are allowed. This prevents that algorithms to allow something like ...->(city1Road1)->(city1Road2)->(city1Rail)->... if the costs between changes of transportation are non existent.

Maybe this helped to solve your problem.

neo4j-apac commented 5 years ago

Yes, you are right. It looks your way is the only viable workaround of this so far.

omnishore1 commented 4 years ago

hi thanks for this help, I want to return also the distance ex : RETURN algo.getNodeById(nodeId).name as station,cost , ??distance??