hasadna / open-bus

:bus: Analysing Israel's public transport data
93 stars 29 forks source link

open-bus-gtfs-etl: analyze_gtfs_stat command uses too much RAM #344

Closed OriHoch closed 2 years ago

OriHoch commented 3 years ago

Looks like the analyze_gtfs_stat function loads all data to RAM, this uses a huge amount of RAM which can get very expensive. Please see if you can instead process row-by-row. If you need to refer to processed data then I suggest to use kvfile for simple key-value based storage on filesystem.

OriHoch commented 3 years ago

check out if we need aggregations / if we can load data directly to DB / split loading processes to e.g. load_stops / load_routes ...

OriHoch commented 2 years ago

fixed in hasadna/open-bus-gtfs-etl#15