google / transit

https://gtfs.org/
Apache License 2.0
579 stars 177 forks source link

Proposed Best Practice: always including trip_id in TripDescriptor for SCHEDULED trips #465

Open isabelle-dr opened 3 months ago

isabelle-dr commented 3 months ago

Context

This issue is part of an effort to bring one of the outstanding issues we've identified from the Best Practices repos.

Issue

In the Realtime spec, producers can identify trips by either 1) having a trip_id that corresponds to the trip_id in GTFS Schedule, or 2) by including all of route_id, direction_id, start_date, and start_time instead.

However, based on conversation in #gtfs-realtime (you can join the Slack here), option 1 (using trip_id) is easiest and most commonly used by consumers. Option 2 causes headaches and sometimes isn't supported by consumers at all. As a result, it would make sense to recommend that producers use trip_id in all cases.

Proposed solution

Add a mention in the TripDescriptor trip_id description that for SCHEDULED trips that are not frequency-based, the identification of the trip should be done via trip_id.

Tagging folks involved on slack @leonardehrenfried @e-lo @lauriemerrell @gcamp @doconnoronca @willcanderson

doconnoronca commented 3 months ago

TransSee requires trip_id, but I haven't seen a feed without it, even for unscheduled trips. TransSee also benefits significantly from having route_id including.

leonardehrenfried commented 3 months ago

@doconnoronca Doesn't the static GTFS contain the relationship from trip to route?

doconnoronca commented 3 months ago

@doconnoronca Doesn't the static GTFS contain the relationship from trip to route?

Yes, but it is an extra query to look it up. It is also needed for added trips.

leonardehrenfried commented 3 months ago

What if the route_id in a SCHEDULED trip update doesn't match what's in the GTFS?

doconnoronca commented 3 months ago

What if the route_id in a SCHEDULED trip update doesn't match what's in the GTFS?

If that happens it's probably a symptom of bigger problems. The increased performance it worth the risk.

willcanderson commented 3 months ago

I think this makes sense as a best practice. If most consumers only support matching on trip_id, or greatly prefer that, it is important for producers to know that.

I will describe below why this proposed best practice has been difficult for my agency, and what we are doing about it. I don't think the difficulty means we shouldn't make this a best practice; I'm just noting some implications.

Why this proposed best practice can be difficult

The proposed best practice of including trip_id in TripDescriptor for SCHEDULED trips has a tricky interaction with the GTFS Schedule best practice of including both the current and upcoming schedule in a single GTFS Schedule file.

Each trip_id value in trips.txt must be unique. At my agency it is not feasible to generate new trip_id values for minor schedule revisions. This means we must modify the trip_id values in GTFS Schedule in order to make them unique when we merge the current and upcoming schedules. Modifying the values makes them fall out of sync with the trip_id values in our realtime data sources.

Options for data producers

At my agency, we are currently writing some code that ingests the data coming out of our trackers and rewrites trip_id values to match the ones in GTFS Schedule. Other agencies have described doing something similar.

But if the industry is working toward a future where having matching trip_id values across GTFS Schedule and Realtime is easy for producers, even if they are merging two schedules for GTFS Static and even if their scheduling and tracking tools are from different vendors, it may be worth revisiting the conversation about whether to make the primary key for trips.txt (service_id, trip_id) instead of (trip_id).

Alternatively, I wonder if getting realtime tracker systems to use the Operational Data Standard for their schedule information would help--presumably if ODS is a superset of GTFS, trip_id values must match. But I haven't wrapped my head around the question of whether ODS would include both current and upcoming schedule info in a way that would correspond to the public GTFS Schedule file.

isabelle-dr commented 1 month ago

This issue could be of use for this discussion: https://github.com/google/transit/issues/462

skinkie commented 1 month ago

In addition to what @willcanderson wrote, I think what is fundamentally missing is the relationship between GTFS Static version and GTFS-RT. This is #434

This is an everything is connected situation where a broader vision is important.