Feature: Option: Add regions to graphs

root-11 commented 1 year ago

Minimal illustration of the problem

(classic graph vs multi-graph)

Assume G is a binary tree with a root and 2 levels if bifurcation resulting in $2^{2}$ leaves with randomized weights on the edges.

Assume that all search starts at the root and ends by identifying the route to a leaf using BFS to determine the shortest path.

Problem: Due to the symmetric nature of the graph, shortest path BFS will practically visit every node every time a search is performed.

Proposition 1: If (!) G is redesigned such that the graph is holds information about what can be found below each bifurcation point, only 10 nodes need to be visited. This is ideal from a search perspective, but the memory overhead is problematic as it requires the graph to store all leaves at all bifurcation levels: ~10x more memory. A second problem with this approach is that it only works for DAGs.

Proposition 2: If a partition of G can be declared as a another graph G' and BFS and shortest-path search can query G' to whether or not it contains or has a route to the target node, then the search can be accelerated:

If the target node is in G' and BFS sees G' as a single node in G, then the destination node has been found.
If the target node is NOT in G', BFS can eliminate the search through G' all together.

For the binary tree this means that G defined as $G{1}' + G{2}' = G{1.1}' + G{1.2}' + G{2.1}' + G_{2.2}...$ a BFS or shortest-path will require only $2*10$ recursive queries akin to "is target in G'".

The reason for 2*10 is because at each recursive step the binary partition will have at least one failure.

Edges cases:

For non-trees, such as road networks, which may be partitioned using the "AA", "A", "B", ... road network classification, each branch will lead to a $G_{n}'$ where knowing the probability of reaching the target (for example using (lat, lon)-distance) will help to accelerate the search, but if such information isn't available - for example in information networks - the better method is to partition by proximity e.g. in clusters of $G/2$-nodes. The search must thereby treat G' as nodes that either have been visited or not.

04t02 commented 1 year ago

20,000 location graph 20000_locs.txt

04t02 commented 1 year ago

40,000 locations graph 40000_locs.txt

04t02 commented 1 year ago

20,000 locations as str(Coord.xyz) graph_as_coords.txt

root-11 commented 1 year ago

@04t02 for each of the graphs, what would constitute a typical region and what would a common path search look like?

root-11 commented 1 year ago

I picking two random locations, I see that the paths in 20000 are very limited:

Source20047 : 1
Conveyor20045 : 1
Intersection20042 : 1
Conveyor20043 : 1
Intersection20029 : 1
Conveyor20028 : 1
Intersection20025 : 2
RackIn20015 : 1
Shuttle20005 : 4001   <--- only point of spread.
BinLocation19986 : 1

04t02 commented 1 year ago

A typical region would be Shuttle to it's corresponding BinLocations (these are bi-directional edges most easily found by looking at the connections from the Shuttle).

Common path searches would be Source to BinLocations and BinLocations to Sink. As you can see, the graph is very much like a dandelion in that the Source to Shuttle path is very simple and edge count explodes from the Shuttle to BinLocations

root-11 / graph-theory

Feature: Option: Add regions to graphs #36