google / transit

https://gtfs.org/
Apache License 2.0
616 stars 183 forks source link

frequencies.txt primary key: add end_time, headway_secs & exact_times? #514

Open derhuerst opened 3 weeks ago

derhuerst commented 3 weeks ago

Describe the problem

Currently, [frequencies.txt]()'s primary key does not include exact_times:

Primary key (trip_id, start_time)

Use cases

*Isn't it a perfectly valid use case to have >1 frequencies entries** for the same trip and starting at the same time? Some examples:

Proposed solution

I propose to extend the primary key to the other columns (end_time, headway_secs, exact_times), too.

Additional information

No response

skinkie commented 3 weeks ago

To be honest, I don't think this is a valid solution. And I even wonder if the trip_id + start_time is actually valid from documentation perspective. My assumption is that it is currently prohibited to have multiple frequencies on the same trip_id. If this is not the case, than I wonder why you want to create ambiguity by introducing a trip_id + start_time + exact_times=1 as variant 1 and trip_id + start_time + exact_times=0. Lets have a discussion on this, also from consumer perspective if this is correctly implemented in multiple systems.

derhuerst commented 3 weeks ago

I'd argue that I trip can run on schedules that a) appear real-world and b) are a great fit for frequencies.txt.

skalexch commented 2 weeks ago

@derhuerst I looked into all GTFS feeds we had in the Mobility Database that have frequencies.txt. Looks like all of them are using (trip_id, start_time) as primary key. They also respecting an implicit rule that end_time must be before start_time of the next frequency based trip_id. This rule is based on the rule for headway_secs that says "Multiple headways may be defined for the same trip, but must not overlap".

Adding to @skinkie 's comments, I think we either should add an explicit rule to the documentation regarding end_time, or have it in the best practices section on frequencies.

As for the case of having supporting buses during rush hour. I think that should be a different trip_id. The supporting buses are part of the same route, but I think it's not a good practice to be part of the same "trip". They should be distinguished from the regular service.

Also with the case of two windows having everything common except for exact_times, it might introduce ambiguity to anyone trying to read the GTFS. In that case, the windows should have different trip_ids to indicate two trip sets that do not behave in the same way.

But there is a case to be made for having (trip_id, start_time, end_time, headway_secs) as a primary key based on an interpretation of the first sentence of the rule mentioned above: "Multiple headways may be defined for the same trip, but must not overlap".

stevenmwhite commented 2 weeks ago

But there is a case to be made for having (trip_id, start_time, end_time, headway_secs) as a primary key based on an interpretation of the first sentence of the rule mentioned above: "Multiple headways may be defined for the same trip, but must not overlap".

This is functionally how I've understood it. The trip_id defines the running time of a generic reference trip, while the start_time, end_time, and headway_secs define a particular window in which that generic reference trip runs on a specified interval.

The same running time trip could run every 10 minutes throughout the middle of the day and every 5 minutes during morning and afternoon peaks. This would be modeled as a single trip, with three separate entries in the frequencies.txt file to represent the three windows.

skalexch commented 5 days ago

@stevenmwhite that is currently possible with (trip_id, start_time) as a primary key. The only catch is that the producer needs to be sure that the windows of different headways do not overlap.