google / transit

https://gtfs.org/
Apache License 2.0
590 stars 182 forks source link

frequencies.txt exact_times=1 trip_id semantics #227

Closed derhuerst closed 1 year ago

derhuerst commented 4 years ago

I have a question about the semantics of frequencies.txt with exact_times=1.


From my experience with GTFS and my observations of GTFS-based UIs, a lot of tools seem to make the following assumption:

A single trip (as defined by a unique trip_id in the GTFS dataset) is one vehicle (or group of vehicles) that I can use without significant interruptions (such as waiting for another vehicle or chasing to a different line). Often, a single trip is considered to be a vehicle which I can travel with for the whole duration of the trip.

The frequencies.txt documentation seems to undermine that assumption however:

frequencies.txt represents trips that operate on regular headways (time between trips). This file can be used to represent two different types of service.

  • Frequency-based service (exact_times=0) in which service does not follow a fixed schedule throughout the day. Instead, operators attempt to strictly maintain predetermined headways for trips.
  • A compressed representation of schedule-based service (exact_times=1) that has the exact same headway for trips over specified time period(s). In schedule-based service operators try to strictly adhere to a schedule.
Field Name Type Required Description
trip_id ID referencing trips.trip_id Required Identifies a trip to which the specified headway of service applies.

I think this is especially important for routing engines: Now, they can't assume anymore that every GTFS data point referring to the same trip_id is tied to one vehicle allowing continuous travel. There are >=1 "runs" of a vehicle, all under the same trip_id, but each of them ends at the last stop specified in stop_times.txt.


Is my understanding of the semantics correct? If it is, I'd argue that this is quite unintuitive and therefore easy to implement in a wrong way. If I misunderstood how frequencies.txt works, let's improve the documentation.

skinkie commented 4 years ago

On Saturday, June 6, 2020 8:05:27 PM CEST, Jannis Redmann wrote:

Is my understanding of the semantics correct? If it is, I'd argue that this is quite unintuitive and therefore easy to implement in a wrong way. If I misunderstood how frequencies.txt works, let's improve the documentation.

Could you give an example how it could be interpreted in a wrong way?

derhuerst commented 4 years ago

Let's consider an excerpt from the example feed linked in the spec:

trip_id arrival_time departure_time stop_id stop_sequence pickup_type drop_off_type
AWE1 0:06:10 0:06:10 S1 1 0 0
AWE1 S2 2 1 3
AWE1 0:06:20 0:06:30 S3 3 0 0
AWE1 S5 4 0 0
AWE1 0:06:45 0:06:45 S6 5 0 0
trip_id start_time end_time headway_secs
AWE1 05:30:00 06:30:00 300

If I assume that all stop_times.txt/frequencies.txt for AWE1 describe one "run" of one vehicle that I can use continuously, then I could conclude that I can stay in the vehicle from 05:30:00 (earliest start in time frame) until 7:00:00 (latest start in time frame + 35min). This is not the case I assume?

skinkie commented 4 years ago

It is not one run (or block). It is a normalisation form of transit data. Including a confidence interval of the arrival time of the next trip. Remaining in the vehicle for no particular reason is an activity that is probably allowed if you would have a day ticket, but that is not what this structure (or GTFS) explicitly defines.

derhuerst commented 4 years ago

It is not one run (or block).

(Not sure what exactly you mean by "run" here, but I will assume you mean what I tried to explain.)

In the GTFS ecosystem, I have often observed the assumption that one GTFS trip corresponds to exactly one "run". Or in plain English: That one GTFS trip means that one vehicle will continuously visit all stops in the trip, without any other trips in between and without additional stops before or after; That after the vehicle has visited all stops in the trip, the "run" is "over".

Making that assumption would probably lead to routing errors (e.g. routes that I actually can't take or that are physically impossible) & unintuitive UIs (e.g. showing the first stop of the trip between other later stops, because another "run" in a time frame of compressed data has started).

If this assumption is not to be made, meaning the stop_times/frequencies feature of GTFS is purely a "normalisation form" to describe when & where any appropriate vehicle of a line will stop, IMO we should clarify this better in the documentation.

(All of this does of course not apply anyways to different schemes of sending vehicles around, like circle-based lines or lines split up by direction.)

antrim commented 4 years ago

GTFS Best Practices offer the below.

Field Name Recommendation
block_id Can be provided for frequency-based trips.

So, that means the following example is valid, and indicates a continuous loop where passengers can stay onboard at stop_A.

stop_times.txt
trip_id arrival_time departure_time stop_id stop_sequence
trip_1 06:10:00 06:10:00 stop_A 1
trip_1 06:15:00 06:15:00 stop_B 2
trip_1 06:20:00 06:20:00 stop_C 3
trip_1 06:25:00 06:25:00 stop_D 4
trip_1 06:30:00 06:30:00 stop_E 5
trip_1 06:35:00 06:35:00 stop_F 6
trip_1 06:40:00 06:40:00 stop_A 7
trips.txt
route_id trip_id service_id block_id
red_loop trip_1 weekday red_loop_block
frequencies.txt
trip_id start_time end_time exact_times headway_secs
trip_1 6:10 18:40 1 1800

Notes

barbeau commented 4 years ago

exact_times=1 trips defined in frequencies.txt should be treated the same way as trips defined in a GTFS that doesn't include the frequencies.txt file - you just "unroll" the pattern defined in stop_times.txt into individual trips from the start to end time defined in frequencies.txt, with the start time for each individual trip being headway_secs apart. Note that then arrival_time and departure_time in this case don't refer to absolute times, but rather exist to define the travel time between each stop in the trip. I agree that the documentation could be improved, including examples, to make this clearer.

Note there is another open proposal to better define in-seat transfers and transfer rules at https://github.com/google/transit/pull/32.

derhuerst commented 4 years ago

exact_times=1 trips defined in frequencies.txt should be treated the same way as trips defined in a GTFS that doesn't include the frequencies.txt file - you just "unroll" the pattern defined in stop_times.txt into individual trips [...].

Okay, thanks for clarification.

In this case, I advocate to state clearly in the documentation that one trip_id does not correspond to one "run" (which I tried to define above). From my subjective experience, this seems to be a quite natural assumption.

antrim commented 4 years ago

@derhuerst : I see what you mean. Perhaps a future modification to the spec or training materials could clarify this.

GTFS was created originally with passenger-facing applications in mind, so a "trip" refers to when a vehicle operates on a route. In passenger-facing information, that usually looks like a row on a timetable.

Operational schedules have runs, which would usually consist of multiple "trips" in the passenger-centric sense of GTFS.

Some (non-standard) GTFS datasets do include information on "runs" as you're thinking of them. Discussion in issue #195. Here is an example runcut.txt file: https://openmobilitydata.org/p/ventura-county-transportation-commission/792/latest/file/runcut.txt

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

skinkie commented 2 years ago

Keep open.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity. Issues can always be reopened after they have been closed.

huntrob commented 4 months ago

I'm currently looking at creating a frequencies.txt file for a rail operation. Having the main reference point as trip_id really threw me for a loop as that is not what I had expected to see there. After looking at this for a while, I determined that it uses a trip_id to pull the required data which I believe is route, stop sequence, and the running time. I think this should be clarified as it would save a lot of trial and error for other users.

skinkie commented 4 months ago

@huntrob consider this trip_id some kind of hash result using the same stop sequence, times between them and calendar. Then this template can be instantiated at different times.