kuanb / peartree

peartree: A library for converting transit data into a directed graph for sketch network analysis.
MIT License
201 stars 23 forks source link

[performance] WIP Parallelization of route edge and wait costing iteration #53

Closed kuanb closed 6 years ago

kuanb commented 6 years ago

This replaced work on https://github.com/kuanb/peartree/pull/51/files

From the previous PR: Partially (incrementally) addressing issue #12

Parallelizes target route processing operation process_route_edges_and_wait_times via dask distributed which allows for modular parallelization architecture which in the future could leverage external resources (useful for large graphs, tethering together whole regions, etc.).


Updates unique to this new PR:

Using multiprocessing, not Dask, for now. Change is incremental.


OLD:

Going with a Dask Bag for now. Keeping it simple for now, and can improve later.

Results from a quick test with AC Transit and the new system:

With interpolate_times set to False: Without Dask (original method):

CPU times: user 40.4 s, sys: 920 ms, total: 41.3 s
Wall time: 41 s

With Dask:

CPU times: user 13.2 s, sys: 530 ms, total: 13.7 s
Wall time: 14.6 s

For reference, this is doing just the first 5 routes: Without Dask:

CPU times: user 6.15 s, sys: 140 ms, total: 6.29 s
Wall time: 6.34 s

With Dask:

CPU times: user 13 s, sys: 440 ms, total: 13.5 s
Wall time: 14.3 s

Even more dramatic is when you set the time interpolation to True: Without Dask: 3min 55s With Dask: 1min 17s

From this, the initial cost of about 14.3 seconds can be seen to initialize the various Dask configurations to enable the parallelization. The upside is the significantly reduced marginal cost of each additional unique route.

Of course, a lot of this matters on the machine you are running. Allowing for access to Dask Distributed's Client will be next to do, which will enable utilizing external resources.

codecov[bot] commented 6 years ago

Codecov Report

Merging #53 into master will increase coverage by 0.25%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #53      +/-   ##
=========================================
+ Coverage   92.54%   92.8%   +0.25%     
=========================================
  Files          10      10              
  Lines         617     639      +22     
=========================================
+ Hits          571     593      +22     
  Misses         46      46
Impacted Files Coverage Δ
peartree/graph.py 97.36% <ø> (ø) :arrow_up:
peartree/paths.py 95.65% <ø> (ø) :arrow_up:
peartree/summarizer.py 97.07% <100%> (+0.35%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7f6862d...e48cfc4. Read the comment docs.