Sources-sinks routing - Githubissues

tomalrussell commented 3 years ago

Network definition

Define a spatial network graph G = {V,E} as a set of nodes V and edges E where each edge connects two nodes.

A node can be represented by a Point geometry. It can be identified by index in the list of nodes or a node_id. It may have an associated Polygon that represents a site, building, or service area.

An edge can be represented by a LineString geometry. It can be identified by index in the list of edges or an edge_id. It has source and target attributes referring to its start and end nodes. It may be directed (True/False), if flows on the graph should only be allowed to traverse the edge in one direction from source to target.

Routing problem

Given two sets of points S and T, find the least-cost route from each s to each t.

Reasonable implementation idea: use single-source shortest paths (Dijkstra) for each s in S, possibly bail early once each t has been reached (no need to visit all nodes).

See also multi-level Dijkstra or contraction-hierarchy approaches as in OSRM.

An aside on routing costs

The cost of moving over an edge can be provided, whether from input data or computed from other edge attributes (e.g. for roads: length, condition, speed limit).

The cost of moving over a node from one edge to another may not be necessary for our purposes (e.g. turn restrictions or turn penalties, waiting times at stations). For reference, OSRM uses an edge-expanded representation to handle this.

Another approach would be to introduce a complete graph (from each in-edge to each out-edge) at the location of nodes where waiting times are applicable, e.g. for a 2-degree node b with some waiting time 3:

    10    3    5
a - - - - b - - - - c

turns into

a - - - - b' - b'' - - - - c
     10      3        5

Where the "waiting cost" of 3 at node b gets represented by an edge of weight 3 between b' and b'' (even though b' and b'' are physically represented as in the same location as b)

tomalrussell commented 3 years ago

Algorithm notes

Contraction hierarchy is explained in H Bast (2012) lectures 6 and 7, key reference is Geisberger et al (2008). Uses a bidirectional dijkstra on a prepared graph to find s->t with as small search radius as possible.
- not clear that graph preprocessing is necessarily useful for bulk routing/simulation-driven analysis. Main motivation is fast ad-hoc (user-generated) queries.
- could do n-directional dijkstra for a set of sources/targets? Any point in this?
Variable research notes, with some useful detail at Telenav/open-source-spec/osrm, brief notes and pointers on graph partitioning
TODO digest

Library notes

igraph has a helper for creating from dataframe and a method for single-source-to-targets shortest paths
graph-tool has a dijkstra_iterator which could be used to build up source-to-targets shortest paths
graph-tool wraps the Boost Graph Library - see wrapper source for possible approach to adapt if we want to push some parts down to C++
OSRM impressions: a lot of thought and time in it; maybe hard to extract or generalize (simplify?) away from OSM.
TODO test some implementations

tomalrussell commented 3 years ago

Also pandana which uses a contraction hierarchy - can do multiple a-b routes, but not yet single-source shortest path trees, as far as I understand - see notes in https://github.com/UDST/pandana/issues/120

tlestang commented 3 years ago

Script scripts/sources_sinks_routing/benchmark_shortest_path.py benchmarks igraph vs pandana for computing many-to-many shortest paths from ensemble S to T.

Igraph computes vertex paths directly. Pandana computes vertex paths which must be transformed into edge path. This can be done through igraph.Graph.get_eids. This adds significant overhead - but faster than looking up edges in the original dataframe based on start and end nodes. Below are the timings for 3 different ensemble size (S and T having the same size). All average timings are in seconds and statistic taken over 10 repetitions (random S and T ensembles).

python benchmark_shortest_path --nreps=10 --reconstr --sizes 5 10 20

Size	Igraph (avg)	Igraph (std)	pandana (avg)	pandana (std)	reconstr (avg)	reconstr (std)
5	0.74	0.06	0.41	0.03	1.24	0.09
10	0.83	0.01	0.44	0.00	5.28	0.04
20	0.88	0.02	0.46	0.02	21.24	0.50

Transforming vertex paths into edge paths is too slow (column reconstr).

Forgetting about shortest path format, here are some timings for larger ensemble sizes (all average times in seconds). Pandana compute vertex paths, igraph computes edge paths.

python benchmark_shortest_path --nreps=10 --sizes 50 300 600

Size	Igraph (avg)	Igraph (std)	pandana (avg)	pandana (std)
50	0.83	0.01	0.44	0.01
300	2.69	0.51	2.11	0.23
600	6.49	0.40	7.46	0.37

igraph seems to scale better than pandana with ensemble size.

nismod / snail

Sources-sinks routing #21

Network definition

Routing problem

An aside on routing costs