transitland / transitland-datastore

Transitland v1 core components. Deprecated and only maintained occasionally. See Transitland v2.
https://transit.land/documentation/datastore/
MIT License
105 stars 18 forks source link

stable (or semi-stable) trip identifiers #713

Open drewda opened 8 years ago

drewda commented 8 years ago

We've heard from at least two different users of Transitland that it would be helpful to provide stable identifiers for trips (or at least identifiers that are more stable than those included in many GTFS feeds' trips.txt file):


@barbeau wrote in January:

Here’s the link on the need for trip hashing in OneBusAway:

https://github.com/OneBusAway/onebusaway-android/issues/333

There is also a link to Conveyal’s implementation of trip hashing for OTP (and on that issue, a link to the algorithm we implemented for our TAD mobile app: https://github.com/opentripplanner/OpenTripPlanner/issues/1573).

As I mentioned, I’d love to see Transitland provide a mechanism to determine if the “same” trip survives a GTFS data update, and what that new equivalent trip_id is. Different apps may be able to tolerate more substantial changes to trips than others, so other metadata about what is the same and what is different may also be useful, although I believe that would also complicate things.

For example, for TAD (doing real-time transit navigation) we needed to know if the following was the same in a trip:

  • Origin stop (location)
  • Destination stop (location)
  • 2nd-to-last stop (sequence relationship to destination stop)

Other portions of the trip, and even potentially the trip departure/arrival times, could change, and TAD would still function correctly. Changes to trip upstream of the origin stop also wouldn’t matter to us.

For OneBusAway, for arrival/departure reminders based on real-time info, we’re more interested in the following for a given origin stop:

  • Departure time of the “same” trip for a given stop (times may not need to be exact – the closest arrival without a tolerance threshold like 10 min may work).
  • Rough geometry of the trip downstream of the stop (i.e., could this trip still take the rider to their same unknown destination).

In OBA we wouldn’t care about changes to trip for stops upstream of the given origin stop in the trip.

But, even if other metadata isn’t available, just known if an equivalent trip as a whole existed would be very useful.


and more recently we heard from a transit agency (https://hellomapzen.zendesk.com/agent/tickets/808):

I was wondering if it is possible to use Transitland in a way that we can add a unique identifier for each trip.

The reason for this is because our vendor changes the trip id field every time we have a service change, even if the trip remains the same. While it is possible to keep the gtfs trip_id the same across service changes, we are not able to keep the trip_id the same for our realtime gtfs.

Adding some kind of unique identifer for our trips would allow us to maintain continuity across gtfs changes for metric purposes.


At present, we just include trip IDs as strings on RouteStopPatterns and on ScheduleStopPairs.

Ideas to consider for the future:

One major question to answer would be how long we persist IDs and records for trips. Erase them whenever we erase an old FeedVersion's ScheduleStopPairs? Or keep the trips, along with RouteStopPatterns to summarize past configurations of the transit network (even if service is no longer scheduled for some of those combinations)?

Timeframe: Not urgent.

derhuerst commented 4 years ago

I'm trying to integrate static feeds, realtime feeds and routing APIs from several public transportation operators/providers in Europe, so I have similar goals.

I need these stable IDs to have at least two properties:

As with any "one standard to obsolete all other existing ones", this ID scheme won't be perfect. There will likely be revised (but incompatible) 2nd version in the future, and therefore >=2 IDs for an entity.

I think the only reasonable way towards a globally stable, globally agreed-upon ID scheme is to store both the current best-effort of a stable ID (in order to gain experience with edge cases) as well as multiple local IDs (in order to keep compatibility with existing systems).


This is only remotely related, but the IPFS community has a lot how, on a meta-level, make addresses & address schemes future-proof (i.e. self-describing, upgradable):

We could use e.g. the multiaddr markup to encode multiple IDs of a operator/station/trip into one "package".