Closed kuanb closed 6 years ago
Merging #53 into master will increase coverage by
0.25%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #53 +/- ##
=========================================
+ Coverage 92.54% 92.8% +0.25%
=========================================
Files 10 10
Lines 617 639 +22
=========================================
+ Hits 571 593 +22
Misses 46 46
Impacted Files | Coverage Δ | |
---|---|---|
peartree/graph.py | 97.36% <ø> (ø) |
:arrow_up: |
peartree/paths.py | 95.65% <ø> (ø) |
:arrow_up: |
peartree/summarizer.py | 97.07% <100%> (+0.35%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7f6862d...e48cfc4. Read the comment docs.
This replaced work on https://github.com/kuanb/peartree/pull/51/files
From the previous PR: Partially (incrementally) addressing issue #12
Parallelizes target route processing operation process_route_edges_and_wait_times via dask distributed which allows for modular parallelization architecture which in the future could leverage external resources (useful for large graphs, tethering together whole regions, etc.).
Updates unique to this new PR:
Using multiprocessing, not Dask, for now. Change is incremental.
OLD:
Going with a Dask Bag for now. Keeping it simple for now, and can improve later.
Results from a quick test with AC Transit and the new system:
With
interpolate_times
set toFalse
: Without Dask (original method):With Dask:
For reference, this is doing just the first 5 routes: Without Dask:
With Dask:
Even more dramatic is when you set the time interpolation to
True
: Without Dask: 3min 55s With Dask: 1min 17sFrom this, the initial cost of about 14.3 seconds can be seen to initialize the various Dask configurations to enable the parallelization. The upside is the significantly reduced marginal cost of each additional unique route.
Of course, a lot of this matters on the machine you are running. Allowing for access to Dask Distributed's Client will be next to do, which will enable utilizing external resources.