nismod / snail

spatial networks impact assessment library 🐌
https://nismod.github.io/snail/
MIT License
9 stars 1 forks source link

Sources-sinks routing #21

Closed tomalrussell closed 3 years ago

tomalrussell commented 3 years ago

Network definition

Define a spatial network graph G = {V,E} as a set of nodes V and edges E where each edge connects two nodes.

A node can be represented by a Point geometry. It can be identified by index in the list of nodes or a node_id. It may have an associated Polygon that represents a site, building, or service area.

An edge can be represented by a LineString geometry. It can be identified by index in the list of edges or an edge_id. It has source and target attributes referring to its start and end nodes. It may be directed (True/False), if flows on the graph should only be allowed to traverse the edge in one direction from source to target.

Routing problem

Given two sets of points S and T, find the least-cost route from each s to each t.

Reasonable implementation idea: use single-source shortest paths (Dijkstra) for each s in S, possibly bail early once each t has been reached (no need to visit all nodes).

See also multi-level Dijkstra or contraction-hierarchy approaches as in OSRM.

An aside on routing costs

The cost of moving over an edge can be provided, whether from input data or computed from other edge attributes (e.g. for roads: length, condition, speed limit).

The cost of moving over a node from one edge to another may not be necessary for our purposes (e.g. turn restrictions or turn penalties, waiting times at stations). For reference, OSRM uses an edge-expanded representation to handle this.

Another approach would be to introduce a complete graph (from each in-edge to each out-edge) at the location of nodes where waiting times are applicable, e.g. for a 2-degree node b with some waiting time 3:

    10    3    5
a - - - - b - - - - c

turns into

a - - - - b' - b'' - - - - c
     10      3        5

Where the "waiting cost" of 3 at node b gets represented by an edge of weight 3 between b' and b'' (even though b' and b'' are physically represented as in the same location as b)

tomalrussell commented 3 years ago

Algorithm notes

Library notes

tomalrussell commented 3 years ago

Also pandana which uses a contraction hierarchy - can do multiple a-b routes, but not yet single-source shortest path trees, as far as I understand - see notes in https://github.com/UDST/pandana/issues/120

tlestang commented 3 years ago

Script scripts/sources_sinks_routing/benchmark_shortest_path.py benchmarks igraph vs pandana for computing many-to-many shortest paths from ensemble S to T.

Igraph computes vertex paths directly. Pandana computes vertex paths which must be transformed into edge path. This can be done through igraph.Graph.get_eids. This adds significant overhead - but faster than looking up edges in the original dataframe based on start and end nodes. Below are the timings for 3 different ensemble size (S and T having the same size). All average timings are in seconds and statistic taken over 10 repetitions (random S and T ensembles).

python benchmark_shortest_path --nreps=10 --reconstr --sizes 5 10 20
Size Igraph (avg) Igraph (std) pandana (avg) pandana (std) reconstr (avg) reconstr (std)
5 0.74 0.06 0.41 0.03 1.24 0.09
10 0.83 0.01 0.44 0.00 5.28 0.04
20 0.88 0.02 0.46 0.02 21.24 0.50

Transforming vertex paths into edge paths is too slow (column reconstr).

Forgetting about shortest path format, here are some timings for larger ensemble sizes (all average times in seconds). Pandana compute vertex paths, igraph computes edge paths.

python benchmark_shortest_path --nreps=10 --sizes 50 300 600
Size Igraph (avg) Igraph (std) pandana (avg) pandana (std)
50 0.83 0.01 0.44 0.01
300 2.69 0.51 2.11 0.23
600 6.49 0.40 7.46 0.37

igraph seems to scale better than pandana with ensemble size.