Open npaun opened 2 years ago
related: https://github.com/google/transit/pull/15#issuecomment-156576400 I also wrote this down a while ago, much like you did: https://gist.github.com/derhuerst/574edc94981a21ef0ce90713f1cff7f6
@npaun and @derhuerst I'm not super familiar with the DST issue (as much as the timezone one) so please feel free to update my #328 to reflect a solution that meets your needs!
loosely related:
First, I'll give my perspective on your specific statements:
It is impossible to express the time "Nov 6 00:00" as part of the Nov 6 service day, unless negative hours are used. The specification never clearly defines or prohibits this practice.
Indeed, we must allow negative GTFS Time values to allow people expressing e.g. 2022-11-06T00:30-07:00
.
Consider a shuttle that runs once an hour on :15 past the hour, every day of the week. […] But this would actually create two duplicative trips on "Mar 12 23:15". When DST ends, it would skip the "Nov 6 00:15" run. Unfortunately, I've rarely encountered producers creating special services for the DST transition days.
Yes, but DST <-> standard time switches will almost always have to be handled in a special way, as AFAIK almost all public transport timetables either use headways (both explicitly as part of the published timetable, or implicitly from an operations perspective) or at least recurring wall clock times.
My opinion phrased in a more general way:
stop_timezone
/agency_timezone
(#328, #322) another.I propose to:
{arrival,departure,start,end}_time
as is) to make clear the DST implications,trip_id
, same point in time") solely caused by the DST start should be filtered out,cc @juliuste
I don't know how today in GTFS the zone which applies a daylight saving is identified. Each agency carries a timezone attributes to identify which zone it uses in GTFS data. E.g. in Germany we have a daylight saving zone (MESZ, engl. CEST). This time has a +2h Offset to UTC. Therefore all timestamps for Germany during this summer time use a timezone often labeled [UTC+2]. But "UTC+2" is not enough to identify that a daylight saving is applied . E.g. for many african countries in this timezone there is no daylight saving at all. So each agency needs to clearly specify its specific timezones. In Germany this might imply the two specifications: MEZ, engl. CET and MESZ, engl. CEST From external sources GTFS users need to know when the switch day and time is for the specified timezones. Today this is not part of the GTFS specifications AFAIK.
I don't know how today in GTFS the zone which applies a daylight saving is identified. Each agency carries a timezone attributes to identify which zone it uses in GTFS data. [...] But "UTC+2" is not enough to identify that a daylight saving is applied . E.g. for many african countries in this timezone there is no daylight saving at all. […] From external sources GTFS users need to know when the switch day and time is for the specified timezones. Today this is not part of the GTFS specifications AFAIK.
Unfortunately, the terms "time(zone) offset" and "time zone" are not used very precisely; The Time zone and List of time zones Wikipedia articles are good examples of this.
But from my experience, modern technical systems have settled on time zone identifiers as defined by the tz database. Its time zone definitions include all relevant information when and how shifts occur.
The GTFS Timezone field type uses tz identifiers:
Timezone – TZ timezone from the https://www.iana.org/time-zones. Timezone names never contain the space character but may contain an underscore. Refer to https://en.wikipedia.org/wiki/List_of_tz_zones for a list of valid values. https://gtfs.org/schedule/reference/
@derhuerst
Thank you for taking the time to consider this topic in detail.
Regarding the suggestions you've proposed:
- Renaming Time to TimeOffset
Agreed -- this would slightly improve the clarity of the spec.
- Allow Time values to be negative, but only if the resulting date+time still refers to the service day.
Agreed, as this is necessary in order to express those times.
- Add a rule that, when processing a GTFS dataset, duplicate "runs" (as in "same trip_id, same point in time") solely caused by the DST start should be filtered out,
Ideally, I'd love to have a heuristic that consumers can use to filter out redundant runs. But I'm not sure what definition we could use.
For instance if we had this situation:
Trip | Start time | Route | Service |
---|---|---|---|
1 | 00:15 | 20 | Sundays |
2 | 23:15 | 20 | Saturdays |
Then 1 and 2 would be mapped onto the same time on the date DST begins. What if trip 2 instead started at 23:16? It wouldn't change the real-world conclusion but would also be different algorithmically. I feel we'd need a quite extensive definition of duplication to solve this. What do you think?
- Add a reminder to specify an additional "run" during DST end.
Agreed.
On top of these, I'd like to add a few suggestions as well.
- Add a section to the spec reminding agencies that runs during the DST transition are special cases, and that extra trips may need to be added on those das.
This section would include a small example showing cases where the reference point is not midnight, similar to the section we already have for clarifying the effect of block_id. It would probably include 'Table 1' from my initial message on this issue.
- It is recommended that trips occuring during the period of DST transition be expressed using the previous service day with times greater than 24:00:00.
This would cover midnight-3am on the date DST begins and the date DST ends, and I think could help make matters less confusing.
I propose to open a PR next week to add suggestions 1, 2, 4, 5, 6 to the spec.
I don't think we can do much more to improve the situation in a backwards compatible way. Perhaps we could consider adding additional fields relating to behaviour in DST, but I haven't come up with anything yet.
Suggestion for resolving the point 3.:
"On service day where DST begins, GTFS consumers MAY remove trips starting before 01:00:00 that are assigned to the same route_id
and that have the same combination of (stop_id
, stop_sequence
) in stop_times.txt than trips starting after 22:59:59 on the service day before."
What do you all think?
Suggestion for resolving the point 3.:
"On service day where DST begins, GTFS consumers MAY remove trips starting before 01:00:00 that are assigned to the same
route_id
and that have the same combination of (stop_id
,stop_sequence
) in stop_times.txt than trips starting after 22:59:59 on the service day before."
This hard-codes the time offset by which the DST shifts; I'm not sure if it always is 1h, and I strongly prefer using a definition that doesn't rely on it being 1h. Same for the time when the DST shift occurs.
Also, as the GTFS spec have more complex rules over time if two stop times are identical (I'm thinking about GTFS-Flex, GTFS-RT, etc.), we should try to find a phrasing that doesn't have to be adapted to them.
What do you think about the following definition? I have put the timezone-related part in brackets, because we might want to discuss such phrasing in #322/#328.
"GTFS consumers MAY remove each trip on the service date where the DST begins after the DST shift, if there is an equivalent (same route_id
, stop_id
s & stop_sequence
s, and all other rules applying) trip before the DST shift that effectively starts at the same same moment [, taking stop_timezone
& agency_timezone
into account].
For example, with a timezone of America/Los_Angeles
, a trip starting at 00:44:55
on service date 20220313
MAY be removed if there is an equivalent trip starting at 23:44:55
on 20220312
."
I just noticed that we should clarify what happens to other entities referencing the removed stop times, e.g. GTFS-RT StopTimeUpdate
s. Are they allowed to drop them? Should the apply both trips' updates to the remaining one?
What if trip 2 instead started at 23:16? It wouldn't change the real-world conclusion but would also be different algorithmically.
I have a different interpretation: If there were two buses (of the same route_id
etc.) with slightly varying start times on a different date than the DST begin, I would assume them to be two separate physical "runs", so I would apply the same interpretation to the DST shift.
@derhuerst, @timMillet
I've revised the wording based on your suggestions:
GTFS consumers MAY remove duplicated trips occurring on the service day on which DST begins or ends, between midnight and the time of transition, as defined by the timezone specified by [
agency.agency_timezone
|stop.stop_timezone
]. A trip is a duplicate of another if they have the sameroute_id
and combination of(stop_id, stop_sequence)
, and both trips [overlap in time | start at the same time]. For example, with a timezone ofAmerica/Los_Angeles
, a trip starting at00:44:55
on service date20220313
MAY be removed if there is an equivalent trip starting at23:44:55
on20220312.
(Alternate wordings are in brackets)
agency_timezone
vs stop_timezone
: It seems that discussion on this topic has stalled but the status quo seems to be agency_timezone
, so I'd choose this option.Currently the static specification doesn't seem to mention GTFS-RT at all, so I'm wondering where would be the best place to add that info about StopTimeUpdates.
Often in North America, agencies will provide relatively frequent service, without using clock-face scheduling.
Montreal (route 51) When DST starts, the 23:58 and 00:55 trips would end up superimposed.
Kelowna (route 8): When DST starts the last two trips would conflict.
In GTFS, specifying trip schedules during the transition into and out of daylight savings time is complex.
Background
Given the definition of time provided in the specification,
Applying this rule to the date DST begins (March 13, 2022 in my region) and the date it ends (Nov 6, 2022), we obtain the following chart:
Table 1
This has some unusual implications, which I am not certain are understood in the same way by all data producers and consumers.
It is impossible to express the time "Nov 6 00:00" as part of the Nov 6 service day, unless negative hours are used. The specification never clearly defines or prohibits this practice.
Consider a shuttle that runs once an hour on :15 past the hour, every day of the week. One could use this very simple schedule for the entire year:
But this would actually create two duplicative trips on "Mar 12 23:15". When DST ends, it would skip the "Nov 6 00:15" run. Unfortunately, I've rarely encountered producers creating special services for the DST transition days.
Questions
Potential changes
To reduce confusion, the specification could state that all trips during the DST transition (e.g.
00:00-03:00 Mar 13
this year in my timezone) shall be ignored. Producers would be required to use times between24:00-27:00 Mar 12
instead. However, this is just a starting point for discussion and I hope that we can collaborate to find a good solution to this problem as a community.