kuanb / peartree

peartree: A library for converting transit data into a directed graph for sketch network analysis.
MIT License
201 stars 23 forks source link

[performance] Trim stop_times before stop time interpolation #93

Closed yiyange closed 6 years ago

yiyange commented 6 years ago

Stop time interpolation works on the entire stop_times df, majority of which will be tossed later based on given requested time range.

Trimming down stop_times before passing it to the stop time interpolation step increases performance by a huge amount.

Here are some evidence of the performance gain on the pt.paths.generate_summary_graph_elements step:

gtfs id current version with trimming
f-9q9-bart 9.0s, 97 edges, 50 stops 3.8s, 97 edges, 50 stops
f-9q9-actransit 132.0s, 5670 edges, 5050 stops 59.8s, 5670 edges, 5050 stops
f-9q8y-sfmta 131.3s, 3806 edges, 3409 stops 66.7s, 3806 edges, 3409 stops
f-9qb-goldengatetransit 10.6s, 519 edges, 473 stops 7.5s, 519 edges, 473 stops
f-9qc-fairfield~ca~us 5.0s, 280 edges, 245 stops 3.1s, 280 edges, 245 stops
f-9qc0-soltrans~ca~us 6.6s, 475 edges, 408 stops 4.3s, 475 edges, 408 stops
f-9qc-westcat~ca~us 6.2s, 263 edges, 220 stops 3.6s, 263 edges, 220 stops
f-9-amtrak*** 14.5s, 306 edges, 260 stops 12.0s, 306 edges, 260 stops
yiyange commented 6 years ago

I ran similar process using UrbanAccess. Here is part of the comparison if you are interested

gtfs id UrbanAccess PT with trimming
f-9q9-bart 4.0s, 97 edges, 50 stops 3.8s, 97 edges, 50 stops
f-9q9-actransit Failed 59.8s, 5670 edges, 5050 stops
f-9q8y-sfmta 97.4s, 3806 edges, 3409 stops 66.7s, 3806 edges, 3409 stops
f-9qb-goldengatetransit Failed 7.5s, 519 edges, 473 stops
f-9qc-fairfield~ca~us 4.1s, 297 unique edges, 250 stops 3.1s, 280 edges, 245 stops
f-9qc0-soltrans~ca~us 3.2s, 473 unique edges, 407 stops 4.3s, 475 edges, 408 stops
f-9qc-westcat~ca~us 2.6s, 263 edges, 220 stops 3.6s, 263 edges, 220 stops
f-9-amtrak*** Failed 12.0s, 306 edges, 260 stops
codecov[bot] commented 6 years ago

Codecov Report

Merging #93 into master will decrease coverage by 0.28%. The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #93      +/-   ##
==========================================
- Coverage   91.91%   91.62%   -0.29%     
==========================================
  Files          12       12              
  Lines         866      872       +6     
==========================================
+ Hits          796      799       +3     
- Misses         70       73       +3
Impacted Files Coverage Δ
peartree/summarizer.py 91.44% <85.71%> (+0.35%) :arrow_up:
peartree/parallel.py 96.4% <0%> (-2.16%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8e61096...86606d9. Read the comment docs.

kuanb commented 6 years ago

Again, thank you so much! Amazing.

yiyange commented 6 years ago

Hmm, I wonder why codecov dropped...