rust-transit / gtfs-structure

Read a GTFS file
MIT License
56 stars 32 forks source link

Release RawStopTime earlier #141

Closed antoine-de closed 11 months ago

antoine-de commented 1 year ago

linked to https://github.com/etalab/transport-validator/issues/172

Memory consumption is too great because we do the parsing in 2 phases, first into a RawGtfs then into a Gtfs and during the conversion we have both structures in memory, resulting in doubling the peak memory needed. In the FR IDF dataset (~13 000 000 stop times), the RawGTFS takes 2.3 G or memory and the GTFs 2.1G, and the peak memory needed is ~3.9.

This PR add a reverse loop on the stop times and schrink to fit the vector (reverse iterating so not to allocate a new vector). Even if it's a naive implementation (we shouldn't have to schrink_to_fit it at every element), it seems the performance impact is negligible and on the IDF dataset /usr/bin/time measure goes from 3.9G to 3.4G