google / transit

https://gtfs.org/
Apache License 2.0
590 stars 182 forks source link

stop_times.txt: Interpretation of times during daylight savings time transition #325

Open npaun opened 2 years ago

npaun commented 2 years ago

In GTFS, specifying trip schedules during the transition into and out of daylight savings time is complex.

Background

Given the definition of time provided in the specification,

Time in the HH:MM:SS format (H:MM:SS is also accepted). The time is measured from "noon minus 12h" of the service day (effectively midnight except for days on which daylight savings time changes occur). For times occurring after midnight, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins. Example: 14:30:00 for 2:30PM or 25:35:00 for 1:35AM on the next day.

Applying this rule to the date DST begins (March 13, 2022 in my region) and the date it ends (Nov 6, 2022), we obtain the following chart:

Table 1

DST Begins DST Ends
GTFS time Date represented GTFS time Date represented
N/A N/A -01:00 Sun Nov 6 00:00:00 PDT 2022
00:00 Sat Mar 12 23:00:00 PST 2022 00:00 Sun Nov 6 01:00:00 PDT 2022
01:00 Sun Mar 13 00:00:00 PST 2022 01:00 Sun Nov 6 01:00:00 PST 2022
02:00 Sun Mar 13 01:00:00 PST 2022 02:00 Sun Nov 6 02:00:00 PST 2022
03:00 Sun Mar 13 03:00:00 PDT 2022 03:00 Sun Nov 6 03:00:00 PST 2022
04:00 Sun Mar 13 04:00:00 PDT 2022 04:00 Sun Nov 6 04:00:00 PST 2022
05:00 Sun Mar 13 05:00:00 PDT 2022 05:00 Sun Nov 6 05:00:00 PST 2022
06:00 Sun Mar 13 06:00:00 PDT 2022 06:00 Sun Nov 6 06:00:00 PST 2022
07:00 Sun Mar 13 07:00:00 PDT 2022 07:00 Sun Nov 6 07:00:00 PST 2022
08:00 Sun Mar 13 08:00:00 PDT 2022 08:00 Sun Nov 6 08:00:00 PST 2022
09:00 Sun Mar 13 09:00:00 PDT 2022 09:00 Sun Nov 6 09:00:00 PST 2022
10:00 Sun Mar 13 10:00:00 PDT 2022 10:00 Sun Nov 6 10:00:00 PST 2022
11:00 Sun Mar 13 11:00:00 PDT 2022 11:00 Sun Nov 6 11:00:00 PST 2022
12:00 Sun Mar 13 12:00:00 PDT 2022 12:00 Sun Nov 6 12:00:00 PST 2022

This has some unusual implications, which I am not certain are understood in the same way by all data producers and consumers.

Time
00:15
01:15
...
22:15
23:15

But this would actually create two duplicative trips on "Mar 12 23:15". When DST ends, it would skip the "Nov 6 00:15" run. Unfortunately, I've rarely encountered producers creating special services for the DST transition days.

Questions

Potential changes

To reduce confusion, the specification could state that all trips during the DST transition (e.g. 00:00-03:00 Mar 13 this year in my timezone) shall be ignored. Producers would be required to use times between 24:00-27:00 Mar 12 instead. However, this is just a starting point for discussion and I hope that we can collaborate to find a good solution to this problem as a community.

derhuerst commented 2 years ago

related: https://github.com/google/transit/pull/15#issuecomment-156576400 I also wrote this down a while ago, much like you did: https://gist.github.com/derhuerst/574edc94981a21ef0ce90713f1cff7f6

e-lo commented 2 years ago

@npaun and @derhuerst I'm not super familiar with the DST issue (as much as the timezone one) so please feel free to update my #328 to reflect a solution that meets your needs!

derhuerst commented 2 years ago

loosely related:

derhuerst commented 2 years ago

First, I'll give my perspective on your specific statements:

It is impossible to express the time "Nov 6 00:00" as part of the Nov 6 service day, unless negative hours are used. The specification never clearly defines or prohibits this practice.

Indeed, we must allow negative GTFS Time values to allow people expressing e.g. 2022-11-06T00:30-07:00.

Consider a shuttle that runs once an hour on :15 past the hour, every day of the week. […] But this would actually create two duplicative trips on "Mar 12 23:15". When DST ends, it would skip the "Nov 6 00:15" run. Unfortunately, I've rarely encountered producers creating special services for the DST transition days.

Yes, but DST <-> standard time switches will almost always have to be handled in a special way, as AFAIK almost all public transport timetables either use headways (both explicitly as part of the published timetable, or implicitly from an operations perspective) or at least recurring wall clock times.


My opinion phrased in a more general way:

I propose to:


cc @juliuste

MartinH-open commented 2 years ago

I don't know how today in GTFS the zone which applies a daylight saving is identified. Each agency carries a timezone attributes to identify which zone it uses in GTFS data. E.g. in Germany we have a daylight saving zone (MESZ, engl. CEST). This time has a +2h Offset to UTC. Therefore all timestamps for Germany during this summer time use a timezone often labeled [UTC+2]. But "UTC+2" is not enough to identify that a daylight saving is applied . E.g. for many african countries in this timezone there is no daylight saving at all. So each agency needs to clearly specify its specific timezones. In Germany this might imply the two specifications: MEZ, engl. CET and MESZ, engl. CEST From external sources GTFS users need to know when the switch day and time is for the specified timezones. Today this is not part of the GTFS specifications AFAIK.

derhuerst commented 2 years ago

I don't know how today in GTFS the zone which applies a daylight saving is identified. Each agency carries a timezone attributes to identify which zone it uses in GTFS data. [...] But "UTC+2" is not enough to identify that a daylight saving is applied . E.g. for many african countries in this timezone there is no daylight saving at all. […] From external sources GTFS users need to know when the switch day and time is for the specified timezones. Today this is not part of the GTFS specifications AFAIK.

Unfortunately, the terms "time(zone) offset" and "time zone" are not used very precisely; The Time zone and List of time zones Wikipedia articles are good examples of this.

But from my experience, modern technical systems have settled on time zone identifiers as defined by the tz database. Its time zone definitions include all relevant information when and how shifts occur.

The GTFS Timezone field type uses tz identifiers:

Timezone – TZ timezone from the https://www.iana.org/time-zones. Timezone names never contain the space character but may contain an underscore. Refer to https://en.wikipedia.org/wiki/List_of_tz_zones for a list of valid values. https://gtfs.org/schedule/reference/

npaun commented 2 years ago

@derhuerst

Thank you for taking the time to consider this topic in detail.

Backwards compatible changes

Regarding the suggestions you've proposed:

  1. Renaming Time to TimeOffset

Agreed -- this would slightly improve the clarity of the spec.

  1. Allow Time values to be negative, but only if the resulting date+time still refers to the service day.

Agreed, as this is necessary in order to express those times.

  1. Add a rule that, when processing a GTFS dataset, duplicate "runs" (as in "same trip_id, same point in time") solely caused by the DST start should be filtered out,

Ideally, I'd love to have a heuristic that consumers can use to filter out redundant runs. But I'm not sure what definition we could use.

For instance if we had this situation:

Trip Start time Route Service
1 00:15 20 Sundays
2 23:15 20 Saturdays

Then 1 and 2 would be mapped onto the same time on the date DST begins. What if trip 2 instead started at 23:16? It wouldn't change the real-world conclusion but would also be different algorithmically. I feel we'd need a quite extensive definition of duplication to solve this. What do you think?

  1. Add a reminder to specify an additional "run" during DST end.

Agreed.


On top of these, I'd like to add a few suggestions as well.

  1. Add a section to the spec reminding agencies that runs during the DST transition are special cases, and that extra trips may need to be added on those das.

This section would include a small example showing cases where the reference point is not midnight, similar to the section we already have for clarifying the effect of block_id. It would probably include 'Table 1' from my initial message on this issue.

  1. It is recommended that trips occuring during the period of DST transition be expressed using the previous service day with times greater than 24:00:00.

This would cover midnight-3am on the date DST begins and the date DST ends, and I think could help make matters less confusing.


I propose to open a PR next week to add suggestions 1, 2, 4, 5, 6 to the spec.

I don't think we can do much more to improve the situation in a backwards compatible way. Perhaps we could consider adding additional fields relating to behaviour in DST, but I haven't come up with anything yet.

timMillet commented 2 years ago

Suggestion for resolving the point 3.:

"On service day where DST begins, GTFS consumers MAY remove trips starting before 01:00:00 that are assigned to the same route_id and that have the same combination of (stop_id, stop_sequence) in stop_times.txt than trips starting after 22:59:59 on the service day before."

What do you all think?

derhuerst commented 2 years ago

Suggestion for resolving the point 3.:

"On service day where DST begins, GTFS consumers MAY remove trips starting before 01:00:00 that are assigned to the same route_id and that have the same combination of (stop_id, stop_sequence) in stop_times.txt than trips starting after 22:59:59 on the service day before."

This hard-codes the time offset by which the DST shifts; I'm not sure if it always is 1h, and I strongly prefer using a definition that doesn't rely on it being 1h. Same for the time when the DST shift occurs.

Also, as the GTFS spec have more complex rules over time if two stop times are identical (I'm thinking about GTFS-Flex, GTFS-RT, etc.), we should try to find a phrasing that doesn't have to be adapted to them.

What do you think about the following definition? I have put the timezone-related part in brackets, because we might want to discuss such phrasing in #322/#328.

"GTFS consumers MAY remove each trip on the service date where the DST begins after the DST shift, if there is an equivalent (same route_id, stop_ids & stop_sequences, and all other rules applying) trip before the DST shift that effectively starts at the same same moment [, taking stop_timezone & agency_timezone into account]. For example, with a timezone of America/Los_Angeles, a trip starting at 00:44:55 on service date 20220313 MAY be removed if there is an equivalent trip starting at 23:44:55 on 20220312."


I just noticed that we should clarify what happens to other entities referencing the removed stop times, e.g. GTFS-RT StopTimeUpdates. Are they allowed to drop them? Should the apply both trips' updates to the remaining one?

derhuerst commented 2 years ago

What if trip 2 instead started at 23:16? It wouldn't change the real-world conclusion but would also be different algorithmically.

I have a different interpretation: If there were two buses (of the same route_id etc.) with slightly varying start times on a different date than the DST begin, I would assume them to be two separate physical "runs", so I would apply the same interpretation to the DST shift.

npaun commented 2 years ago

@derhuerst, @timMillet

I've revised the wording based on your suggestions:

GTFS consumers MAY remove duplicated trips occurring on the service day on which DST begins or ends, between midnight and the time of transition, as defined by the timezone specified by [agency.agency_timezone | stop.stop_timezone]. A trip is a duplicate of another if they have the same route_id and combination of (stop_id, stop_sequence), and both trips [overlap in time | start at the same time]. For example, with a timezone of America/Los_Angeles, a trip starting at 00:44:55 on service date 20220313 MAY be removed if there is an equivalent trip starting at 23:44:55 on 20220312.

(Alternate wordings are in brackets)

Currently the static specification doesn't seem to mention GTFS-RT at all, so I'm wondering where would be the best place to add that info about StopTimeUpdates.

Examples

Often in North America, agencies will provide relatively frequent service, without using clock-face scheduling.


stm-51

Montreal (route 51) When DST starts, the 23:58 and 00:55 trips would end up superimposed.


bctk-8

Kelowna (route 8): When DST starts the last two trips would conflict.