Closed mossblaser closed 8 years ago
Realistically, how big might this become? Is it worth investigating a more cache-like structure?
In practice: 1 entry (with a list of 1261 2-element tuples).
New entries are only ever added when the NER router is called with a new radius value. In practice the only reason someone would change this (or call route more than once anyway for that matter) is for the purposes of doing routing experiments. In such cases I've been doing order tens of runs and so the cache size should still be very small in practice.
I considered making things get evicted but basically I couldn't be bothered to implement it due to the above. What do you think?
I considered making things get evicted but basically I couldn't be bothered to implement it due to the above. What do you think?
That seems entirely fair to me.
OK so I think this is about as far with this as I'm happy to go at the moment... Headline outcomes of this PR:
RoutingTree
s now take up half as much memory as before.The most controversial part of this patch set is that swapping set
s for list
s in the RoutingTree
structure is strictly a breaking change. In practice, the only code which creates RoutingTree
s outside the router is probably in test code for the routing paper. I think that thanks to the wonders of duck typing these tests probably will continue to work though they ideally would be updated...
This LGTM, apart from wanting an extra newline.
I think I'm happy to let this breaking change pass as (a) I agree that duck-typing should probably, generally, mean it's not an issue, (b) it's a mostly internal change.
Routing algorithm performance improvement
While performing various experiments I've been bitten by the routing process taking a ridiculous amount of time while performing large experiments. This PR is an effort to reduce this runtime...
Headline outcomes:
RoutingTrees
now take up half as much memory as before.Implemented:
concentric_hexagons
: consider non-generator, memoize concentric hexagon list (d8c05a8: very minor speedup)longest_dimension_first
: just return a list (38c0662: not noticable speedup but cleans up code a little)RoutingTree
memory usage (2c48ad6: halves memory consumption and yeilds minor speedup)Also under consideration for the future, probably not this PR:
RoutingTree
to reduce memory use further: would be a quite breaking change but should mostly only impact routersshortest_torus_paths
: low-hanging fruit but low-impactBenchmark
To benchmark the performance of the router, a uniform randomly-connected network is created and routed by the script below. This benchmark reflects a 'bad' netlist in which most routes are quite long. This is currently something the router deals with very badly in terms of performance.
On my laptop, using Python 3 and the existing implementation (2f0018a), the results are as follows:
Tweak 1: Optimise common-case where no bad links are encountered
Running under cProfile the router was spending around 1/3rd of its time in the
avoid_dead_links
method which attempts to modify initially generated routes such that they route around any dead links. This function is rather naively implemented on the assumption that it will usually find a dead link and generates a working-copy of the routing tree as it checks it. Since most routes are not affected by dead links, a new functionroute_has_dead_links
is used to quickly check whether dead links are actually present in a routing tree before trying to fix it.After making this change (2f99746), the results are as follows:
Tweak 2: Cache concentric hexagons sequence
The
concentric_hexagons
function is called once per destination. As a very simple generator the overhead of the generator is measurable. By memoizing this generator's output this overhead can be saved. A simple change resulting in a minor but measurable performance improvement.After making this change (d8c05a8), the results are as follows:
Tweak 3: Make
longest_dimension_first
just return a listAvoid the need for a generator in a lot loop.
After making this change (38c0662), the results are as follows:
So... probably not significant... That said since the generator's output was always converted to a list anyway this change is beneficial from a code-clarity point of view at least...
Tweak 4: Optimise
RoutingTree
memory usageBefore diving in and making a whole new complex data type for RoutingTrees that optimises out strings of singleton nodes, I've attempted to shrink the structure which already exists.
Approximate peak memory usage numbers on the benchmark (as measured using
memory_profiler
) for each of the changes made are enumerated below:RoutingTree
node: 1150 MB__slots__
: 880 MBAfter making this change (2c48ad6), the runtime also improves slightly as follows:
Tweak 5: Optimise neighbourhood exploration
When generating routing trees, the NER algorithm searches for nearby vertices (or pieces of route) to route to rather than always routing back to the source of the net. Originally this was achieved by searching in a spiral pattern outward from the destination vertex looking for pieces of route to attach to. With the default search radius of 20 this requires 1261 checks, all of which are carried out when the destination is more than 20 hops from another part of the route.
A new, alternative implementation instead iterates over every known route segment to find out if any are within the specified search radius. In practice this often results in substantially fewer checks being required. To reap the benefits of both approaches, a simple heuristic automatically selects the approach to use on a route-by-route basis.
A substantial comment in the implementation (near the top of
ner_net()
inner.py
) explains this change in greater detail.After making this change (294debd), the results are as follows:
Regression check for the common case
As a regression check, the following alternative benchmark models a more well-behaved application in which most routes are very short.
Before these changes (2f0018a):
After applying tweaks 1-5 (294debd) things have sped up for the common case too!