Open mpadge opened 2 years ago
Hi @mpadge. Thanks for opening this issue.
I share this exact same frustration with the official reference. I remember looking it up when writing gtfstools::frequencies_to_stop_times()
but couldn't find a good definition. I'm glad multigtfs
include good docs on it.
Regarding the suggestion, I really like the idea of asserting the GTFS content when reading it. However, assert_gtfs()
currently only checks the structure of GTFS objects: whether they are lists, if elements are named, if all elements inherit from data.frame
, etc.
I was wondering what would be the best way of implementing this content assertion. Maybe we could implement an assert_gtfs_content()
function, or even add a assert_content
argument to assert_gtfs()
. Whatever implementation we choose, we could expose this functionality with a strict
argument to import_gtfs()
, that when TRUE
would make the GTFS reading operation raise an error if the feed content is not valid.
We could include a number of different assertions on this content assertion: if any of the tables include duplicate values, if any table includes duplicate primary keys, if some of the ids listed in one table are present in the other, etc.
This is frustratingly not part of the official feed descriptions anywhere, but one of the best references i can find comes from the
multigtfs
docs, which state:This means that any feeds which have frequencies.txt tables should only pass
assert_gtfs()
if alltrip_id
values given in "frequencies.txt" are also present in "stop_times.txt." I need to write this assertion for my own purposes, but would like it to exist in the most general place possible, which would seem to be withingtfsio::assert_gtfs()
. Thoughts @dhersz @rafapereirabr?