Open derhuerst opened 3 years ago
If this is not the case, please excuse me for the noise, and close this Issue!
same (potential) problem in other libs:
related:
Thanks @derhuerst, that's all very concerning! I don't have time right now to look further, but will definitely be keeping an eye on here to see where this takes us. It must also affect gtfsrouter
, because i don't explicitly take care of that kind of situation. A general solution would be really helpful for all.
Thanks, I have some trouble wrapping my head around it at the moment. However, as an initial reaction I'd say this is more of an issue for feed providers than consumers. tidytransit does not use UNIX datetime values (like 2021-03-28T00:30+01:00
) only hms values (00:30:00
without date information). This might be a problem in filter_feed_by_date where we don't include trips running past midnight from previous dates.
Generally, I haven't really run into issues with the feeds I work with (well, we rarely focus on saturday nights). Also, I'd really like to avoid adding timezones in tidytransit...
Thanks, I have some trouble wrapping my head around it at the moment.
I can relate. ๐ I keep forgetting the details of this weird date/time handling after a while, and have to get familiar with it again.
However, as an initial reaction I'd say this is more of an issue for feed providers than consumers.
Both, right? Providers/authors need to know about this when transferring their timetables into GTFS, and consumers need to know about it when processing GTFS.
tidytransit does not use UNIX datetime values (like
2021-03-28T00:30+01:00
) only hms values (00:30:00
without date information).
I'm not sure about frequencies.R
, but raptor.R
works with absolute times, right? How would you otherwise handle routing across date "boundaries" (a.k.a. routing across trips defined with difference service days).
This might be a problem in filter_feed_by_date where we don't include trips running past midnight from previous dates.
Sounds like another issue.
Generally, I haven't really run into issues with the feeds I work with (well, we rarely focus on saturday nights).
I made the experience, that this kind of bug is very hard to come across in practice, but at least in Germany, there are many trips running across the DST <-> standard time boundaries.
Another problem is that so many public-transport-related software has buggy date/time handling, even widely use ones, so it's hard to identify the intended behaviour.
Also, I'd really like to avoid adding timezones in tidytransit...
In this case, I'd propose to clearly document this shortcoming somewhere, so that users of tidytransit are aware of the consequences.
Totally agree on your concluding statement @derhuerst!
Both, right? Providers/authors need to know about this when transferring their timetables into GTFS, and consumers need to know about it when processing GTFS.
Yes, of course. I guess my point is that tidytransit should just handle feeds "as is" and not try to fix incorrect data. I'm still not sure if that point really applies though ๐
I'm not sure about
frequencies.R
, butraptor.R
works with absolute times, right? How would you otherwise handle routing across date "boundaries" (a.k.a. routing across trips defined with difference service days).
Well, it doesn't handle routing across date boundaries... However, that's not explicitly documented. The date used in filter_feed_by_date
isn't the actual date but the date a trip belongs to ("Betriebstag"). It filters stop_times.txt by service_ids running on that day, there's no padding with trips running on service dates before or after. A new function like filter_feed_by_timeframe
(with datetimes as limits) could cover that. But as you said, that's another issue.
I made the experience, that this kind of bug is very hard to come across in practice, but at least in Germany, there are many trips running across the DST <-> standard time boundaries.
They exist in Switzerland as well, there are actually more than 6000 trips running between 00:00 and 03:00 on October 31st this year (assuming my calculations in this gist are correct).
In this case, I'd propose to clearly document this shortcoming somewhere, so that users of tidytransit are aware of the consequences.
Definitely. A lot of this issue comes down to properly documenting assumptions and intended behavoir.
Both, right? Providers/authors need to know about this when transferring their timetables into GTFS, and consumers need to know about it when processing GTFS.
Yes, of course. I guess my point is that tidytransit should just handle feeds "as is" and not try to fix incorrect data. I'm still not sure if that point really applies though ๐
Just to prevent a misunderstanding: A GTFS provider can author perfectly valid GTFS data that follows GTFS's definition of day-relative time, and AFAICT tidytransit wouldn't parse it correctly; This is not about handling the feed "as is", as there is no other reasonable assumption for tidytransit to make than assuming that a GTFS feed follows the GTFS spec.
If you're talking about programmatically "guessing" that the provider/author of the GTFS feed violated the spec by using a more intuitive definition of day-relative time: Let's not do that. ๐
would be great if we could link to a concise and clear explanation of this bug with demonstrated impact for users.
We could add this to https://gist.github.com/derhuerst/574edc94981a21ef0ce90713f1cff7f6. You're very welcome to propose in a comment on what to add to it!
GTFS Time is not defined relative to midnight, but relative to noon - 12h. While that makes "writing" GTFS feeds easier, it makes processing a lot harder.
Expected functionality
As explained in my note about GTFS Time values, with the
Europe/Berlin
time zone (+1h standard time to +2 DST shift occurs at2021-03-28T02:00+01:00
), I expectdeparture_time
of00:30
of a trip running on2021-03-28
to happen at1616884200
/2021-03-28T00:30+02:00
, not at1616887800
/2021-03-28T00:30+01:00
;departure_time
of06:30
of a trip running on2021-03-28
to happen at1616905800
/2021-03-28T06:30+02:00
, not at1616909400
/2021-03-28T06:30+01:00
.Describe the bug
I'm very inexperienced with R and not that familiar with this code base, but it seems that tidytransit is affected by this problem on those days that the DST <-> standard time switch occurs on.
I'm not sure how that actually manifests in tidytransit's output, but I assume that wrong delays will be calculated, or that realtime data can't be matched against static data.
I tried to find some places in the code base:
related: https://github.com/google/transit/pull/15