Open cedarbaum opened 11 months ago
Ah this is gnarly :) I think your solution would certainly work, though it may make the API somewhat confusing because it kind of breaks the REST semantics.
I wonder as an alternative could we transform the trip ID for every realtime trip that comes into Transiter, and use that as the "trip ID" that we store in the database? We could then persist the regular trip ID as original_trip_id
or something like that. The transformed trip ID could be <trip_id>_<started_at_date>_<started_at_time>
, with defaults of 00-00-000
and 00:00
if the started at fields are not provided.
Good points! I think changing the trip id format is definitely nicer from a REST perspective, but my concern is that it loses the 1-to-1 mapping with the source GTFS content. A couple other solutions I was thinking about:
.../trips/{trid_id}
endpoint to return a list of trips with further references to individual trip URIs: .../trips/{trid_id}/{start_time}
..../trips/{trid_id}
and .../trips/{trid_id}_{start_time}
to match to the same resource in cases where there is no ambiguity. If multiple trips are running with the same (GTFS) trip ID, then only allow .../trips/{trid_id}_{start_time}
to work and return 404 otherwise.Curious to know your thoughts! Also I should mention this isn't, from my view, an urgent issue, just something I was thinking about while reading GTFS documentation. My intuition is it's relatively rare in the wild, but I've neither run into it nor gone out of my way to look for such cases as of yet.
I believe we could maintain the 1-1 mapping. Suppose we had the following convention for the "normalized trip ID":
<trip_id>_YYYYMMDD_HHMMSS
<trip_id>_YYYYMMDD_
<trip_id>__HHMMSS
<trip_id>__
In this case given a normalized trip ID, you can split on the last two _
to get the original trip ID, start date and time back, irrespective of the structure of trip ID (which may itself contain _
characters).
As you say in option (2), we could also have options where this other trip ID is an alias for the regular trip ID, and so the API would work with either option as long as there is no ambiguity.
Our conversation so far has been very theoretical :) it would be interesting I think to find examples of systems that have this issue (maybe Amtrack?). Also it seems to be related to the GTFS frequencies.txt
file which is a way to define many trips with the same trip ID but offset from each other.
The field
started_at
is currently not populated/used or used in thetrip
table. There are 2 reasons this can be useful:(1) is a longer term concern associated with the work described in https://github.com/jamespfennell/transiter/issues/11, but (2) could prove useful for general data integrity with the existing API.
To accomplish this, the
transiter.public.trip.trip_route_pk_id_key
, will have to be changed to incorporate thestarted_at
field as well (e.g.,transiter.public.trip.trip_started_at_route_pk_id_key
). This will break the assumption used throughout the system that, at any given point, there is a unique trip ID per route. For example,/systems/{system_id}/routes/{route_id}/trips/{trip_id}
always returns a single trip. I believe this can be mostly solved with the below changes:Trip
orTrip.Reference
is returned by the API, also return thestarted_at
field..../trips/{trip_id}
endpoint, add an optional query parameter?started_at={date}
to disambiguate multiple trips with the same ID. If this query parameter is not provided, always return the earliest trip. I believe the default case matches what would happen today, since the later trip could not be added to the table until the earlier trip ends.@jamespfennell please let me know if agree with above problem statement and if you think this sounds like a workable solution.