r-transit / gtfsio

Read and Write General Transit Feed Specification (GTFS)
https://r-transit.github.io/gtfsio/
Other
13 stars 3 forks source link

Assert frequencies trip_id values present in stop_times table #30

Open mpadge opened 2 years ago

mpadge commented 2 years ago

This is frustratingly not part of the official feed descriptions anywhere, but one of the best references i can find comes from the multigtfs docs, which state:

When trips are defined in frequencies.txt, the trip planner ignores the absolute values of the arrival_time and departure_time fields for those trips in stop_times.txt. Instead, the stop_times table defines the sequence of stops and the time difference between each stop.

This means that any feeds which have frequencies.txt tables should only pass assert_gtfs() if all trip_id values given in "frequencies.txt" are also present in "stop_times.txt." I need to write this assertion for my own purposes, but would like it to exist in the most general place possible, which would seem to be within gtfsio::assert_gtfs(). Thoughts @dhersz @rafapereirabr?

dhersz commented 2 years ago

Hi @mpadge. Thanks for opening this issue.

I share this exact same frustration with the official reference. I remember looking it up when writing gtfstools::frequencies_to_stop_times() but couldn't find a good definition. I'm glad multigtfs include good docs on it.

Regarding the suggestion, I really like the idea of asserting the GTFS content when reading it. However, assert_gtfs() currently only checks the structure of GTFS objects: whether they are lists, if elements are named, if all elements inherit from data.frame, etc.

I was wondering what would be the best way of implementing this content assertion. Maybe we could implement an assert_gtfs_content() function, or even add a assert_content argument to assert_gtfs(). Whatever implementation we choose, we could expose this functionality with a strict argument to import_gtfs(), that when TRUE would make the GTFS reading operation raise an error if the feed content is not valid.

We could include a number of different assertions on this content assertion: if any of the tables include duplicate values, if any table includes duplicate primary keys, if some of the ids listed in one table are present in the other, etc.