The GTFS stop times dataframe can take as much as >1.2 gigs in memory if we pull all of it (varies per gtfs package.)The actual .txt it's based off of only needs ~160 MB 🙃
Only reading in the useful columns takes our usage down from 1.2 gigs to ~0.86, and is filtered down to an eighth of that after later filtering.
**We might need to shrink this usage down even further later on: an approach here if we keep OOM-ing would be to first stream through the file with a more lightweight file reader, identify the # rows which contain the trips we care about, then feed those rows indices into the skiprows arg. Is this a nightmare? Is pandas a nightmare? Are we in a nightmare?
The GTFS stop times dataframe can take as much as >1.2 gigs in memory if we pull all of it (varies per gtfs package.)The actual
.txt
it's based off of only needs ~160 MB 🙃Only reading in the useful columns takes our usage down from 1.2 gigs to ~0.86, and is filtered down to an eighth of that after later filtering.
**We might need to shrink this usage down even further later on: an approach here if we keep OOM-ing would be to first stream through the file with a more lightweight file reader, identify the # rows which contain the trips we care about, then feed those rows indices into the
skiprows
arg. Is this a nightmare? Is pandas a nightmare? Are we in a nightmare?