jarondl / pygtfs

A python (2/3) library for GTFS
MIT License
63 stars 44 forks source link

Issue or Question or ?: large gtfs without calendar (only calendar_dates) #79

Open vingerha opened 11 months ago

vingerha commented 11 months ago

Hi, I am remodelling the gtfs solution in HomeAssistant and have a use case with a large file from the NL, the sqlite turns into 7Gb. As this dataset does not contain calendar entries, I need to rewrite the query to compensate for that and since sqlite does not allow outer joins I need to run it twice with an UNION ALL. Due to the large amount of data the query is pretty slow (db browser : 20-23 sec) and I was wondering if I could optimize this wqith indexes. This I will do myself but 2 questions:

vingerha commented 11 months ago

Massive improvement is by adding a idx on stop_times stop_id and trip_id...query goes to below 1s

vingerha commented 11 months ago

Sorry, closed incorrectly