Closed LeoFrachet closed 1 year ago
Hi GTFS Community,
What agencies need more often is to update the schedules or the predictions of a trip and choosing how to display those times to the user, either saying it is real time and the vehicle is being tracked or saying it is an updated scheduled and the vehicle is not being tracked.
That information can be provided easily in the TripUpdates feed. The ServiceChanges can be used less often to convey bigger changes in the data.
Changing headsigns or other trip specific data would add too many things in the TripUpdates feed. Although it is conceivable, it can be better to split that to a different feed that aims at bigger modifications of the data.
It is true that the three options are valid, but the option 3 solves the problem of updating schedules and deciding which predictions are real time or not relatively quickly while being scalable and easier to implement for all the agencies/AVL that already have a TripUpdates feed.
I approach this proposal with caution, because it will increase the complexity of the GTFS specification, creating multiple ways to define the same objects, and therefore demand an increase in the complexity of software consuming the GTFS format. I acknowledge the point of view that this is necessary to provide up-to-date information to passengers. But time should be taken to consider the impact carefully, because once this change happens and some people adopt it there will be no going back to the simpler world where a single file contains a complete snapshot of a transit system's schedules, stops, etc.
The proposal linked to above ( http://bit.ly/gtfs-service-changes) begins with the following statements:
GTFS Service Changes are then presented as a solution to these issues, but I do not see them as self-evident axioms.
Is there a fundamental reason why GTFS cannot be processed multiple times in a day? For many applications, and for many feeds, it takes at most a few minutes to completely handle a feed. I acknowledge that for very large feeds like all of the Netherlands or the New York City region, fully processing and integrating a whole new data set can be more on the hour-long time scale. You probably wouldn't want to re-publish feeds every hour of every day, but for the occasional schedule revision I don't see an inherent problem with just re-publishing a new feed.
As I posted on the service changes proposal, this second statement assumes that a polling (pull) method is used to receive the GTFS-RT messages. In bigger systems I believe only a streaming (differential push) method makes sense. This is done in the Netherlands, Norway, and Finland and consumed by OpenTripPlanner. A very large number of service patches could be pushed out using such a system without encountering any bandwidth constraints.
I don't rule out the possibility that a third time scale is needed between the static and real-time feeds, but it seems questionable. We should be absolutely sure that effective operation can't be achieved with two time scales before sacrificing simplicity.
I initially lean toward Option 1: revise GTFS-RT TripUpdates to support all cases A, B, C, and D.
You list as disadvantages the fact that handling cases C & D "forces us to redefine almost every GTFS concepts slightly differently, therefore duplicating the spec."
Can you clarify this? Why would all GTFS concepts need to be redefined, and why would the new definitions be different?
It seems quite healthy to me to continue work on GTFS-RT to make it capable of a wide range of revisions to the scheduled data.
What I like about Option 2 (as @LeoFrachet says) is that it's a clear separation of functionality between the feeds, where TripUpdates would be used only for real-time predictions, and ServiceChanges would be used for any static modifications to the network. There are backwards compatibility issues for consumers that aren't aware of new enums when you start piling new functionality on top of TripUpdates. Separating these network/schedule updates into a new channel with a clear purpose ensures that unaware consumers don't start interpreting schedule updates as real-time predictions.
Additionally, if consumers want to consume just VehiclePositions and Service Changes and generate their own predictions, they can do this without needing to parse the TripUpdates as well.
On 31 Oct 2018, at 03:18, Sean Barbeau notifications@github.com wrote: What I like about Option 2 (as @LeoFrachet https://github.com/LeoFrachet says) is that it's a clear separation of functionality between the feeds, where TripUpdates would be used only for real-time predictions, and ServiceChanges would be used for any static modifications to the network.
I like the terminology @LeoFrachet used here: "a clear distinction between schedule updates and real-time updates” (or predictions), i.e. GTFS-RT is for real-time only, GTFS is for schedules only, the hypothetical service updates are also for schedules only. This line of reasoning does have merit.
But it’s not entirely clear to me why it’s important to have a one-to-one correspondence between data formats and categories of data. Already the concept of real-time includes two different kinds of information: empirical observations of vehicle location and lateness, and predictions of the downstream effect of that empirical information.
We want to distinguish clearly between a) schedules (planned service), b) observations of actual position and delay, and c) predictions of future service. But I don’t see any reason why GTFS-RT couldn’t contain all of these.
To my mind the core distinction between GTFS and RT is the time scale over which they’re valid. GTFS provides a slowly changing baseline that is patched in real-time by a stream of RT messages. Even this division into two time scales is essentially an optimization. The number of layers of patching at different time scales is somewhat arbitrary. I’m not sure it’s a good idea to introduce additional layers (time-scales) of patching as optimization responding to current operational details (bandwidth, push vs. pull RT etc.) because we’ll then be stuck with the additional complexity forever.
There are backwards compatibility issues for consumers that aren't aware of new enums when you start piling new functionality on top of TripUpdates. Separating these network/schedule updates into a new channel with a clear purpose ensures that unaware consumers don't start interpreting schedule updates as real-time predictions.
All of these approaches have backward compatibility issues. They all require significant changes to all GTFS consumer software. But the introduction of ServiceChanges is arguably more disruptive.
GTFS-RT is currently interpreted as patches on top of a specific version of a GTFS-static feed. If it’s redefined as patches on top of a set of ServiceChanges on top of a GTFS-static feed, any software that doesn’t know about ServiceChanges can/will have an incorrect view of the system. The new challenge also emerges of ensuring that the correct versions of each layer are combined.
On the other hand, if schedule updates are sent as GTFS-RT, older software interpreting those schedule updates as predictions seems comparatively harmless. It still results in the end user seeing the service that the operator expects to provide them.
I’m not entirely opposed to the concept of ServiceChanges, but I do think the irreversible impact on the unity and complexity of the GTFS ecosystem should be considered carefully.
-Andrew
Having read both this and proposal #111 the following questions arise:
1) How often would an operator need to update a schedule ? 2) How long in advance are they aware of the new schedule ?
The answer to both these questions is presumably along the lines of "as long as a piece of string" but if the general feeling was that the answer for 1) were "Not too often" and 2) "Usually more than a day" then would the existing mechanism (as mentioned above) of just publishing a new schedule be best.
If this works in over 90% of cases is there really a need to introduce a new layer. I've seen cases where the schedule is updated on several consecutive nights and it may well be to cater for cases like this.
As regards user experience, I think if you can show reliably where a bus or train is on a map at a given time AND the user can see the scheduled or predicted times of arrival for the preceding and following vehicle stop, they will form an opinion themselves as to when the vehicle will arrive at their stop regardless of the schedule or predictions. The point I am getting at is if extra complexity is introduced, confidence would be needed that it can improve the user experience.
Changing headsigns or other trip specific data would add too many things in the TripUpdates feed. Although it is conceivable, it can be better to split that to a different feed that aims at bigger modifications of the data.
I acknowledge that this opinion is shared by several people, but before making a spec change it will be important to justify that opinion. When you say this is "too many things" what is the threshold for "too many"? What specific technical or conceptual restrictions make it excessive to include this information in real-time updates?
Why would it be considered excessive to augment an existing spec, but not excessive to define another separate spec containing the same information, requiring significant software development and additional complexity in every GTFS consumer and data pipeline?
As someone working on producing and consuming large GTFS static and realtime feeds, @skinkie I think your opinion would be valuable here. Do you find it advantageous to add another time-scale of patches with a new format between GTFS-static and GTFS-RT? I see that option 1 above says CC @Stefan and I wonder if that is supposed to be you, because the Github user by that name seems inactive.
I discussed this with Leo and did some software development for exchanging a day worth of GTFS data inside tripUpdates. In my opinion tripUpdates should be extended with functionality for adding all possible GTFS static fields, opposed to define a new format. Because no matter what the outcome is with ServiceChanges, there are cases we must update some stuff of existing trips in realtime, which is currently not supported.
But in general my opinion is: this can be exchanged with SIRI-PT, what does justify to make a GTFS-RT alternative?
@skinkie my sense is: many people are bothered by the huge catch-all nature of the Transmodel/SIRI ontology and the verbosity of its fetch/subscription mechanism, and GTFS-RT is a chance to complete a more compact spec that covers the 95% of common passenger information cases.
SIRI is more of an alternative to using GTFS-RT at all. As long as people see benefit to staying in the GTFS-static + GTFS-RT world, I would say it makes sense to complete the functionality available in that pair to cover some high percentage of common use cases.
I'm thrilled to see this conversation moving forward! Thanks @harringtonp, @skinkie & @abyrd for your contributions!
A few answers here.
How often would an operator need to update a schedule ?
For years now we have producers updating their schedules every day (e.g. WMATA in US-DC). We started to speak about ServiceChanges when some big producers started to speak about updating their (big) GTFS every hours. From what I've heard from the GTFS consumers side of the industry, nobody is ready for that. But I fully agree that "The existing pipeline cannot take it" doesn't not imply "We need a new format". It's an open question. What I want to stress out is that the industry is moving from a seasonal update of their GTFS (aka 4 times per year) to a every-day and even every-hour update (which is great!), but knowing how to address that is an unresolved question.
Do you find it advantageous to add another time-scale of patches with a new format between GTFS-static and GTFS-RT?
@abyrd I assume you haven't actually read the GTFS-ServiceChanges proposal, which is fair. The goal of ServiceChanges is to stick to the CSV GTFS format. It's working by declaring what type of change you want to do (deletion, addition, modification), then selecting a row (table name + id), then if needed providing the field names and values you want to change. So the whole goal is to not add another format, but to stick to CSV GTFS. The only exception is that you're allowed to specify a day for your changes, because otherwise editing the service_id by hand is a huge mess. So GTFS-ServiceChanges aims to be a kind of GTFS-delta if you want. Not another format.
The reason why I said extending TripUpdates "forces us to redefine almost every GTFS concepts slightly differently, therefore duplicating the spec" is that e.g. GTFS-rt StopTimeUpdate object is pretty different than CSV GTFS stop_time object. It's arrival value can be given either by a "delay" or by a "time", with time defined as an absolute POSIX time... which is completely different from the HH:MM:DD format used in CSV GTFS, which allow hours above 24 and which defines 01 as noon minus eleven hours.
Another example: the TripUpdate object doesn't contains the trip_id. It's its child, the TripDescriptor which does. So if we want to add the feature to add or alter a route, should we replicated the same pattern and define a RouteUpdate containing a RouteDescriptor containing the route_id? Or should we simplify it and define a RouteUpdate which will do both? Whichever you pick, you'll define specific object that people will have to memorize. With ServiceChange, you have nothing new to memorize. Routes are in the routes
table and contains route_id
, route_stop_name
, etc...
====
That being said, I agree with what you guys said: in all cases, we will have the backward compatibly that we have new data to be ingested. Either because GTFS will have to updated every hours, or because TripUpdates will have changed, or because there will be a new feed.
The decision which has to be made here is an practical decision. On a theoretical point of view, there is no problem, you can just output a new CSV GTFS as often as you want. But with today implementations, it's not practically possible to consume such update. Regarding expanding TripUpdates, Stefan & Guillaume both think we won't have size issue. So the decision is really an industrial decision, of whether this industry:
I would be nice that everybody gives his point of view, and then we'll do whatever the industry agreed on. @slai & @dbabramov among others.
But I agree with Andrew: there is a need which requires to be addressed.
Where is the GTFS service changes proposal Leo, its not showing up in searches for me ?
@harringtonp GTFS-ServiceChanges - http://bit.ly/gtfs-service-changes
@harringtonp My bad sorry. I added it in the original doc. @barbeau Thanks.
Wrapped up in this debate is also how we handle trip.schedule_relationship=ADDED
going forward. Currently it's not well defined in the spec and it seems to have a small set of producers using it for different things. See https://github.com/google/transit/issues/106 and https://groups.google.com/forum/#!topic/gtfs-realtime/W6bm2Xj3p-Q for agencies that are producing it as well as examples and explanations.
Wrapped up in this debate is also how we handle
trip.schedule_relationship=ADDED
going forward.
There's also a connection with the UPDATED_SCHEDULE
value that was proposed in PR #111 (which has been removed in the current version of the proposal). The ScheduleRelationship enum and field trip.schedule_relationship make more sense to me as a place to indicate that a TripUpdate is an updated schedule, rather than real-time information for a vehicle that is already operating.
@LeoFrachet I'm reading your longer response above and thinking it through. I think there are some deeper questions emerging here. There are mentions of producers updating feeds every day, every hour, or even hypothetically every 10 minutes in snowstorms. It's understandable that operations staff might be rethinking schedules on an hour-by-hour basis, and they may want to publish those changes immediately to give their customers the best information they can. But I would expect all such changes to those feeds to be at least one or two days in the future from the publication date (ideally weeks in the future).
If they are publishing updated GTFS that includes changes to the upcoming 24-48 hour period, this is problematic. The GTFS static feeds represent schedules. They communicate to riders / customers the service the operator intends (and in many cases is legally obligated) to provide in the future. The rider counts on this data for planning journeys in advance.
Anything that is changed in the near future is not a schedule or an update to a schedule. From the rider or data consumer's perspective it is an unpredictable and unexpected disruption of planned service. If I check how to make a journey to the airport tomorrow morning, but the data producer's bus breaks down overnight, the data producer should not publish new "schedules" tomorrow morning saying that they had no planned service to the airport.
The data they publish, and the data presented to the rider who re-checks the journey planning system in these circumstances, should be the same planned service that was seen the day before, with an additional layer showing that the service is severely delayed or cancelled.
I'm all for dynamically adapting service and routing in real-time, but that doesn't fact that the high-capacity backbone of most transit systems is composed of predictable fixed routes. It is my position that this predictable baseline can be expressed with GTFS static at least several days in advance, and everything else can be handled with streaming real-time messages.
Point 2, about "redefining GTFS concepts": this was just a misunderstanding. I interpreted "redefining concepts" to mean the changing the meaning of the terms/concepts themselves in different places, e.g. Trip or Route does not mean the same thing in RT as in static. Fortunately this is not the case, and the concepts have consistent meanings. I see now what you mean: that the same entity would be described using a different syntax or format in the GTFS static vs. GTFS-RT layer, and that format would have to be designed and documented for the RT layer. This is a legitimate concern.
I did in fact read the service changes proposal early on. I just think the issue of using a different format (GTFS CSV mapped into Protobuf messages) is a distinct issue from the introduction of an entire new layer of patching.
Perhaps it was inevitable that GTFS-RT would grow to encompass changes to most entities in the static GTFS. It's unfortunate then that the representation initially chosen in GTFS-RT is so different from GTFS-static. Adding more layers can't really compensate for that past decision though. The fact remains that an additional layer obligates every GTFS consumer in the world to modify their pipeline, and any that don't will silently begin receiving an incorrect picture of the network.
Service changes seem focused on editing or replacing, on rewriting history as if scheduled services never existed. This might be a convenient perspective for data producers who are often very worried about the public perception of service delays and disruptions, and the associated regulatory penalties. But the reality is that in most places with GTFS data, service is planned months in advance. Almost everything else is a disruption and should be represented as such.
By the way the link above to the services changes document does not work anymore. I believe this is the document: https://docs.google.com/document/d/1bpNGrQTXbkyImwRO3VZeQdbxwMzeJnDgxj08WXak4i0/edit#heading=h.c2kju5nsoemr
We're still pretty early in our journey with changes other than delay or cancellation in our systems, so I don't have any strong opinions right now, but I will follow along.
It seems like we're in agreement that the GTFS/-RT model is currently failing in the ability to provide all the information riders need during times of abnormal operation, e.g. strikes, equipment failures or emergency operations, and these are the times that riders need accurate information the most.
Option 0 - while this seems simplest and producers are probably pushing for this because it's familiar and also widely consumed, it removes the ability for the consumer to identify which trips were replaced, which is useful for alerting the rider. This is a big deficiency that often makes riders think the app is broken because trips are missing, when this is actually reality.
~I don't have any strong opinions on options 1 or 2, but there isn't really an example on what option 1 would look like, especially with the inconsistencies raised here. Maybe an example of a potential proposal to the same level as in ServiceChanges would help.~
~I think the discussion about whether ServiceChanges is changing schedules and TripUpdates are disruptions is a bit of a side issue and a matter of framing. More importantly, if shoehorning all the capabilities of ServiceChanges into TripUpdate-style messages produces something that's inconsistent or difficult to use, then that's probably an argument for going with something like ServiceChanges.~
One last thing - it's worth noting that if streaming GTFS-RT is the only way to efficiently deliver large schedule changes via TripUpdates, then that's going to be a significant change for many consumers I suspect, who are now just periodically downloading blobs from a server. If operators are going to be serving up small blobs, and all of a sudden when an incident occurs, start serving up significantly larger blobs, then that's not going to work well.
EDIT: I've discussed this further internally and philosophically, I believe there is a class of changes between long-term planned schedules in GTFS, and last-minute operational changes in GTFS-RT, that cannot be communicated with the existing mechanisms.
Given the case of maintenance work overrun or an unconfirmed strike, there's no way to communicate the difference between -
a) trips the operator plan to run tomorrow, b) a new trip that's been added in the next hour due to operational changes
For a), as a rider, I'd consider these with the same reliability as regular schedules with additional consideration given to the alert attached. I would make plans based on them, but know they are subject to change and will likely need to check again closer to the expected departure time.
For b), as a rider I'd consider these to be much more reliable as it's essentially 'real-time' information that the operator has communicated based on the current situation.
Regardless of whether the current design of ServiceChanges is what we want, I believe trips in point A are what ServiceChanges is trying to solve and there's value in providing that kind of information to the user. Indeed in the UK, there are 3 levels of changes - long term plan (LTP), short term plan (STP) and very short term plan (VSTP) - that match this.
It could be argued that the time difference between the creation of the message and the trip could be used to infer the reliability of information, but I think there is a subtle difference that's worth explicitly communicating.
My background is in customer information and at Metro Transit in Minneapolis-St. Paul. My background is more on the UX side than the technical backend, but I like Option 3 for clarity of how the feeds should be used and what they represent.
I disagree with this characterization – Service changes seem focused on editing or replacing, on rewriting history as if scheduled services never existed. What appeals to me about the ServiceChanges format is that it makes clear that it is a deviation from normal scheduled service, represented in the static GTFS. We can’t control how regularly consumers consume our GTFS feed, which currently is updated weekly. It would be preferable to direct people to a default base schedule that should be used if it can’t be consumed as frequently as it’s produced. Lots of detours and disruptions have uncertain or imprecise start and end dates and times. I worry that adding all detours and disruptions to the GTFS would result in some consumers reflecting detours that have expired and lead to more confusion.
We integrate some detour routing and stop closures into the static GTFS (if long-term, significant, and predictable enough), but this doesn’t distinguish permanent routing and stops from temporary detours. Ideally, we’d want to be able to flag these detour differences (stop closure, new temporary stop, routing change, etc.) so riders know that service is disrupted and they shouldn’t go to their regular stop or to be aware that they may be looking for a zip-tied temporary bus stop sign instead of permanent stop/station infrastructure.
Providing information about where service is actually operating (what stops are open, where buses will go) so riders can plan trips and get accurate service information rather than relying on just Alert messages to convey these disruptions offers accessibility benefits for riders with limited English literacy and those using assistive technologies for navigation.
Even for riders who can read alert messages, it’s less useful if they can’t get a trip plan or accurate service information and they have to parse messages in order to figure out what to do. In the case that a train isn’t operating to the airport, for instance, we would want to make clear that the train is canceled (not just disappeared from the schedule) and would want riders to see that replacement shuttle service is operating and how that option works.
My experience standing on closed train platforms in a Transit uniform is that many customers trust their phones and trip plan results and service data more than alert messages or any instruction staff can provide. We’re failing to meet riders’ expectations if we aren’t providing the information about how to complete their trip based on service as it’s actually operating.
I think Laura points out a critical distinction in how we as planners, operators and analysts think about transit service. We have a schedule that we publish. This is what we tell people they can depend on, and we do our best to operate that schedule. Deviations from the long-term schedule should be shared as deviations, not new schedules. They are un-expected or short-term changes to operations.
As someone who uses GTFS feeds in a wide range of analyses, now at Metro Transit (Minneapolis/St Paul) and formerly at JWA, I think updating the GTFS more frequently would exacerbate problems we already have with schedules updated every few weeks. There has to be some kind of baseline for analysis and planning activities. If you publish a GTFS feed every 30 minutes, I don't have anything I can use for analysis.
When the GTFS changes every 30 minutes or more, how do I:
The standard GTFS feed should be used for planned service for a reasonable period of time, like a seasonal pick.
I'm not sure I have a strong opinion on which deviations from planned services belong in TripUpdates vs ServiceChanges.
Thanks @slai, @lauramatson & @botanize for the feedback! Sorry for the silence on our side, we keep on working on this subject with more 1-to-1 discussions with both producers and consumers to better understand the needs and the scope, to see which proposal would be the best fit.
Once the smoke will have cleared, we'll come back with a proposal to discuss.
Having read through this discussion again and looked at the service changes proposal document, I would be very much in line with Andrew's ( @abyrd ) thoughts. I would find extending the current TripUpdates mechanism preferable and think this really needs to be looked at properly before contemplating the introduction of a whole new layer.
What I would recommend is taking some definite examples which cover the most common cases and seeing how they could be modeled in TripUpdate extensions. And I don't believe this would have much affect on the overall size of the TripUpdate protocol buffer file.
If we take the first example in the service changes proposal document (MBTA snow routes in Boston) all trips for route 62 for the snowy day would have a ScheduleRelationship of CANCELED. Each trip added for route 62 for that day would have a ScheduleRelationship of ADDED. The trip_id in the TripDescriptor is meaningless in this case as it does not reference the schedule so the language in the spec would need to be relaxed. The route name can be derived from the TripDescriptor route_id using the schedule and a trip_headsign field could be added in to the TripDescriptor giving the destination.
Moving on with this example to each stop in the route and the existing StopTimeUpdate messages covering these, the arrival/departure times would be specified as absolute times rather than delays and the ScheduleRelationship could again be ADDED (so added is used for both the trip and each stop). If the stop exists in the current schedule then the stop_id is sufficient.
If it is a newly added stop then there could be additional fields such as stop_name, stop_lat and stop_lon whose names mirror those in the schedule. This requires spec changes but I would imagine it could be done in a backwards compatible and tidy way.
If a stop has moved but is still viewed as the same stop (it hasn't moved far) then the stop_id could be used in combination with new fields stop_lat_moved, stop_lon_moved which give the new temporary location.
From what I can see this largely covers the snow route case and it would also cover "Adding a new stop" in case C above. The case C "Adding a new route" could be done in a similar fashion by specify a route_ short_name and route_type in the TripDescriptor (try keep field names the same as in the schedule). And with regards to Case D, I'm not sure why this would be needed if Case C is flushed out and can be activated quickly by a producer.
Finally, there are strong cases made by others for not having too regular schedule changes. I would largely agree with this and appreciate how the schedule anchors a system and provides a point of reference. I have seen daily updates to schedules on a few consecutive days but suspect this is largely due to errors. On a technical point however, I would venture a guess that most GTFS consumers should be able to update a schedule on a nightly basis without difficulty. If a consumer is checking for a new schedule once a week on a Sunday night/early Monday morning then they could just as easily check at the same time each night. After all, is there likely to be that much more system activity in the early hours of a Saturday morning than there is on a Monday morning... Moving to hourly checks however would be an altogether different story as you could potentially end up have to do updates at very busy times.
Thanks @lauramatson and @botanize for your commentary - it's helpful to have additional input and points of view.
@lauramatson and @botanize seem to be emphasizing that GTFS-static should not be published very often, certainly not every 30 minutes or every day. I am not sure if their comments were in response to things I have written above, so I should clarify my position, especially where we are in complete agreement: I think there is an important distinction between schedule data and updates (deviations from planned service). And I am not suggesting that GTFS-static should be published more often - I only made the formal observation that there is currently no technical restriction or clear limit on how often it could be published.
I agree with every point made about keeping riders updated with very recent information, ensuring that information is distinct from baseline schedules, and the pitfalls of publishing new GTFS-static "schedules" every hour or every day.
However I don't see any direct line from these ideas to the proposal to introduce an additional layer of updates, rather than attempting to express all updates in a single layer. That proposal seems to be more driven by the inconvenience of protocol buffers and the "impedance mismatch" between Protobuf-based GTFS-RT and CSV-based GTFS-static.
If the core problem is that it's messy to express all the different kinds of schedule updates in Protobuf-based GTFS-RT, an additional option has yet to be voiced: such a new text-based update format could completely replace Protobuf based GTFS-RT instead of layering with it. I realize there are many reasons this might be a bad idea, but from a maintenance, maintainability, and approachability point of view, completely replacing GTFS-RT seems less problematic to me than adding layers. See also comments on #109 about the difficulty of extending and maintaining Protobuf specifications for an evolving spec.
I'm not necessarily advocating replacement of protobuf-based GTFS-RT, but pointing out that such a replacement may be no worse (indeed may be better) than adding layers.
@harringtonp it's certainly within reason to add and modify trips in GTFS-RT. If I'm not mistaken both kinds of updates are already produced in the Netherlands and consumed by OpenTripPlanner (org.opentripplanner.updater.stoptime.TimetableSnapshotSource#applyTripUpdates). @skinkie should be able to confirm or add nuance to this statement.
My sense is that people are nonetheless hesitant to use GTFS-RT for additional kinds of updates because:
This entails having two different representations for many things (GTFS-static CSV and Protobuf-based GTFS-RT). As you say it would be interesting to see in practical use cases how problematic this really is. This might be a serious concern.
They perceive GTFS-RT as ill suited for scaling to large amounts of updates because they think of it as a polling-based system, where many clients will be requesting increasingly large datasets at increasingly high frequencies. But the advocates of broader use of GTFS-RT are probably thinking of incremental / differential mode where only changed or new messages are sent to connected consumers. Differential GTFS-RT has already been implemented for about 5 years and there is agreement on the semantics among several organizations that produce it. I see no reason differential GTFS-RT couldn't handle just about any volume of accumulated updates since the last published GTFS-static schedule.
While differential GTFS-RT would appear to be a good thing and may well be the future (and internally I moved to a differential update algorithm), I fail to see how allowing extra information in TripUpdates will increase its size in any significant way.
By far the largest part of this file are the StopTimeUpdates. Taking the snow route example above, cancelling an existing route means just a small TripDescriptor entry with a ScheduleRelationship of CANCELED to remove the existing trip. In most cases the extra fields for the added route (such as headsign and route name) will appear just once in the TripDescriptor for the update. StopTimeUpdates will generally not contain any extra data, the exception being when a stop on a trip is not in the current schedule.
In fact I suspect that when updates are used in this mode, in many cases TripUpdates will be smaller than normal due to stops been dropped in the replaced routes (as you would expect on snowy days). If there are examples where the size significantly increases, I'd like to see them illustrated.
@harringtonp this is my understanding of the GTFS-RT size increase issue: This appears when a bunch of trips are changed many hours or days in advance, when the vehicles are not yet out on the road. For example if a rail line is damaged, and the whole whole line is removed and replaced with shuttle bus trips for the next three days, in order for those updated trips to appear in advance trip planning results they will need to be included in the GTFS-RT feed, in addition to any vehicles currently running.
When using the polling method, all of this extra data would be fetched every time each client polled the server, even though unlike real-time updates for vehicles out on the road, it does not change between polling events. In the event of major disruptions, this could amount to a lot of wasted bandwidth on redundant data being sent repeatedly to many clients.
It's a fair point that differential GTFS-RT adoption will be slow and some producers will probably stick with the polling method. Indeed, polling a static file could be much simpler for a small agency or city.
But it seems to me that any agency considering producing an entirely new layer of patch data (Service Changes) could produce differential GTFS-RT with roughly the same amount of effort - they must already have the software development capacity to track batches of changes against their baseline GTFS and output that information in a new format.
Thanks for that example Andrew, I hadn't realized Trip Updates were sometimes used in this way.
Thanks for that example Andrew, I hadn't realized Trip Updates were sometimes used in this way.
They generally aren't used this way. I'm suggesting that they could be used this way as a replacement for the proposed ServiceUpdates, to avoid having two separate layers of GTFS patching.
If, as some people have stated, GTFS-RT is just too far from static GTFS (concerning additional fields like headsigns etc.) and Protocol buffers are too problematic for an evolving spec, then the other option is that ServiceUpdates completely supersede and replace GTFS-RT. I'm not advocating that, just pointing out that it's probably still simpler than having multiple layers of GTFS patching.
Sooo...
After a few months and quite a few meetings with some stakeholders about Service Change, I'd like to share an updated version of GTFS-ServiceChanges, with a reduced scope. It is only expanding GTFS-TripUpdates, and to handle only three cases:
trip_headsign
, trip_short_name
, the shape) and some stop_time values (namely stop_headsign
, pickup_type
, drop_off_type
and - if needed - shape_dist_traveled
).stop_id
of a stop time in very specific cases: change of platform within a station or stop moved to a very nearby existing stop (less than 100m).And I think we should also work on removing the ambiguities with schedule_relationship=ADDED
, to allow new trips to be created based on already existing trips. The current draft doesn't tackle this part, we're working on identifying the edge cases. We may create a distinct issue to discuss it.
So they are a few things that this proposal do not intend to cover:
The proposal is still in the Google Doc. I think we should have the conversations on there until we reach some basic agreement on the spec, because reading (and updating) a pull request is IMHO harder.
And here is the link: bit.ly/gtfs-service-changes-v3.
There's bit a lot of good discussion on the doc already, but I feel some of it focuses on a broader topic that we could discuss here (easier to read and access than if it's sprinkled in comments in the doc) at a higher level.
Service changes can be categorised in 3 buckets of impact:
We assumed the frequency of incidents to look like (80% - 18% - 2%), and that they are sufficiently distinct problems that addressing them separately makes sense. Therefore we started with the most common, and easier-to-do, one: the small detours, which look like a good fit to be treated entirely as realtime updates (TripUpdates): They do not break users' previous assumptions (trip still goes where it used to, mostly), and provide last-minute details about it (new time-table, changed stop location, vehicle,...). This is a natural extension of the ability to cancel trips in that same feed.
(2: Large Detour) is a different story for users, as it means the service as they know it doesn't work for them anymore (that part is no different from a (partial) cancellation though), but also did a new route just appeared on the map that nobody knew about? How do users learn about this new service?), and therefore is more of a (temporary) GTFS change than a Real-time one. One idea so far (not finalised) for modelling these might be to let a provider declare "dormant" trips in GTFS (static) and be allowed to (de-)activate them with a simple instruction in real time.
And the (3: Network Makeover) shouldn't be seen as a set of "service changes", but rather a wholesome replacement of the whole network. This could be done by allowing providers to submit an alternative dormant "snow-routes" GTFS feed, and allow them to swap between them in real-time (or with short notice).
I think the above buckets defined by @tleboulenge divides service changes into how consumers tend to think of changes, likely driven largely by past GTFS and GTFS-realtime implementations.
I'd like to hear from more agencies if this matches your view of service changes. My sense is that service changes are viewed by producers more like:
I think the definition of what is big or small will vary by agency.
(1: Planned Changes) can be distributed via GTFS and GTFS-rt Service Alerts.
The (2: Unplanned Changes) are where things get more difficult. I'm hearing from agencies that their current GTFS pipeline can't incorporate these changes, and this is why they are looking towards a GTFS-realtime solution. It seems that we need to consider a method to handle large, unplanned changes via GTFS-realtime in order to handle the use cases of interest to agencies.
Would any agencies like to weigh in on this?
Thanks for providing this clarification on the proposal.
At Metro Transit in Minneapolis, we don't distinguish Large from Small based on how spatially different the detour is from regular route. In creating detour routing, the goal is to resemble regular route and facilitate the same transfers as much as practical, but if a bridge is closed or there are limited options for where buses can operate safely, the detour path and temporary stop locations can differ significantly.
In addition to planned vs. unplanned, the key thing we look at is expected duration. Our process for creating and updating GTFS static isn't very nimble, so if a detour won't be in effect for at least 3 weeks, it won't be integrated in GTFS static. (We're hoping to improve that, but that's our current limitation.)
We do use "Large" and "Small" based on how many customers are impacted to determine other things (do we notify via social media, do we have on-street ambassadors, etc.), but this distinction isn't in our data.
Planned and unplanned detours are created and managed using the same tools. Adding a distinction of Large vs. Small might complicate the data flow. Even for planned detours (e.g., a construction project), the precise start and end time and pattern and stops of the detour itself change often, making it difficult to use the GTFS static to provide accurate information.
Hello GTFS Community,
Thank you for moving this conversation along! In response to the latest comment from @tleboulenge, most of our clients have expressed interest in the ability handle last minute schedule changes both large and small. While "dormant" trips can work in certain cases, it makes the assumption that agencies can plan well in advance for any large schedule change which in my experience is not always the case. There is also the possibility that the "dormant" trips become out of date potentially adding another layer of complexity.
Anything that's a large change needs to be planned at some level, if only because the vehicle operators need to know what they're doing for the change. Here @mbta, we've made investments in improving our GTFS speed, and we can put out a new GTFS within a few hours. However, that's only the case when we've done the work ahead of time to model the change relative to the schedule. We then can apply those shuttles to our GTFS and deploy a new version.
For a large unplanned change (even one we've modeled), we'll only update the GTFS if the disruption is going to continue for more than a day, as that's how long it takes to get the change into GTFS and propagated to larger clients: they're only fetching the GTFS once a day, so changes faster than that may not be reflected.
For a large unplanned change (even one we've modeled), we'll only update the GTFS if the disruption is going to continue for more than a day, as that's how long it takes to get the change into GTFS and propagated to larger clients: they're only fetching the GTFS once a day, so changes faster than that may not be reflected.
@paulswartz For this case, would a GTFS-realtime-based solution be possible (now or in the future) to reflect these changes? Or is there another way you could envision sharing those changes?
@paulswartz For this case, would a GTFS-realtime-based solution be possible (now or in the future) to reflect these changes? Or is there another way you could envision sharing those changes?
@barbeau It would be possible, but I don't know whether it would cut down on the amount of work that we would have to do internally. The advantage for us to having a way to do this with GTFS-RT would be improved speed in having clients updated with the new schedule.
Anything that's a large change needs to be planned at some level, if only because the vehicle operators need to know what they're doing for the change.
That was also our assumption: even if they are triggered "at the last minute", changes have to be somewhat prepared and planned in advance. If it isn't the case, either the bus driver decides for himself on-the-spot what (non blocked) route he can follow, and where he can actually stop, or someone at the depot drafts that for him 30 minutes before the trip, but in both cases, I can't see a way that someone at the agency will technically have the time to turn that into a GTFS update (be it RT or not) with new, on-the-fly, shapes, lat-longs, defining new stops and a new schedule.
If the changes are indeed prepared in advance, then those planned detours can be entered in GTFS either as "dormant" trips or actual trips (e.g. with a time-window in the future), at a time where the technical staff at the agency has the time and peace of mind to write that outside of the urgency.
At the root of this Service Changes proposal lies the assumption that some data is inherently static and doesn't belong to a Realtime feed, and therefore what can be modelled in Realtime is limited (i.e. you can modify an existing trip, but not create an altogether new one).
Uncontroversially (or so I hope), these features are static and immutable at realtime, typically because they are hardware that takes physical work to change:
Perhaps controversially, we thought these were also static, mostly because of the fact that users rely on them to build their understanding of the transit network, and thus live in "common memory":
And finally features that can happily change in real-time:
In that regard, this spec was designed for allowing "changing" an existing service (hence the idea of mapping old to new), but not "creating" a new one.
If the changes are indeed prepared in advance, then those planned detours can be entered in GTFS either as "dormant" trips or actual trips (e.g. with a time-window in the future), at a time where the technical staff at the agency has the time and peace of mind to write that outside of the urgency.
I think this is a flawed assumptions. Not all agencies can issue GTFS update in a few hours like the MBTA. Most take days to do and potentially involves some manual process on top. But they have CAD/AVL system that are aware of detours that could output a real time service change.
but in both cases, I can't see a way that someone at the agency will technically have the time to turn that into a GTFS update (be it RT or not) with new, on-the-fly, shapes, lat-longs, defining new stops and a new schedule.
My understanding is that newer AVL systems do actually support this type of functionality - for one, they need this data internally so their own arrival predictions can adjust. I'll see if I can get a CAD/AVL vendor to comment on the thread.
I agree with @gcamp - for some agencies the GTFS and GTFS-realtime pipelines are nearly independent. GTFS is generated by scheduling software, and GTFS-realtime is generated by AVL software, sometimes coming from two different vendors. Changing these pipelines to put large "real-time" changes into GTFS would require purchasing new scheduling software and/or AVL system, and even integration/communication between scheduling and operations departments at the agency, which may be largely independent today. In short, it's a major and costly change. IMHO a GTFS static-based solution would take a long time for some agencies to adopt.
Alternately, a GTFS-realtime-based solution could be accomplished simply by a new feature in the AVL system that allows dispatchers to draw new routes for detours, completely independent of the GTFS data pipeline. I predict this being adopted far more quickly.
But I'd welcome more comments from producers (agencies and software vendors) on their thoughts.
Hi @barbeau, I'm the Product Director for the Trapeze CAD/AVL product. We do have mechanisms in our system to quickly draw a detour path that can result in a new shape and detecting the cancelled stops applicable to that detour. We have some initial functionality for inputting new Alternate/Temporary Stops on a Detour, and have roadmap work defined to incorporate those stops into the schedule. We also have functionality to define entirely new routes/trips/stops on the fly to account for scenarios like a Bus Bridge when an agency needs to shuttle passengers from one rail station to another due to a track blockage.
I agree with the consensus that getting all agencies to adopt the GTFS static-based solution is going to take much longer to adopt. We tend to see more agencies willing to account for disruptions to the schedule within the AVL system rather than through the scheduling software which would export GTFS static. We see a strong desire from many agencies to not change the base schedule in the Scheduling software until the next Sign Up Period (major schedule change) so even long term disruptions like Construction end up being in AVL until the next schedule change takes place. You also have to consider the fact that disruptions after hours or on weekends can really only be accounted for in the AVL system rather than in the scheduling package due to staffing at the agency. Hope this helps. Let me know if there is anything else you want us to comment on.
@nathan-reynolds This is valuable information, thanks.
One question about modelling detours: When a section of a route is replaced, do you map cancelled stops to their replacement stops, or does it simply close all replaced stops and assigns replacement stops on the detour route, without any relation to the former ones?
Is it possible (or even happens routinely) that stop sequence numbers on the section of the route that is not affected by the detour, but comes after it, are all shifted, if the number of cancelled/replaced stops don't match?
@tleboulenge Mapping alternate stops to their replacements is something we plan to add as part of our roadmap. We want to be able to handle the use case of providing passengers 'travel instructions' from the old stop to the new stop, as well as including the mapping in the TripUpdates feed should that become available to us.
I think it's a valid use case that you could end up with more alternate stops than scheduled stops, but it's probably the exception rather than the norm. We haven't worked through how exactly we'll handle building these into the 'new trip' since it's still on the roadmap, but it's definitely a valid use case.
Hi everybody,
We're still working on GTFS-ServiceChanges, now with a v3.1, which allows to:
The proposal is here: http://bit.ly/gtfs-service-changes-v3_1
We know that in short term, likely only a subset of it will be implemented, since adding full new stops and full new routes may be tricky, but we wanted to provide the mid-term vision, so that we could see which section we want (/need) to adopt in short term.
If you're interested to dive into the proposal, please let me know. Once you'll have read it, it may be worth to have a one-to-one meeting with you to answer your questions and gather your feedbacks. You can email me or contact me on LinkedIn if we haven't exchange already.
Thanks!
I have been looking at the possibility of adding support for GTFS-Service Changes v3.1 to TheTransitClock. Is there a .proto file available and matching bindings?
For those who do not know me, I am the maintainer of TheTransitClock OSS project.
@scrudden: MobilityData will be working on this matter
An update to gtfs-realtime.proto has been committed on this PR in order to reflect the changes that v3.1 of servicechanges provide: https://github.com/MobilityData/transit/pull/47#issue-369318619
Capital Metro has begun work on creating tools to generate GTFS-Service Changes to see if it meets our needs for short term detours happening the same day or prior to the next generation of GTFS data (max 7 days).
Thank you,
[http://www.capmetro.org/email/eSigLogo.png]http://www.capmetro.org/ Daryl Weinberg Transit Systems Architect [http://www.capmetro.org/email/sig_fb.png]https://www.facebook.com/capitalmetro?_rdr=p [http://www.capmetro.org/email/sig_tw.png] https://twitter.com/CapMetroATX o: 512-369-6216 | e: Daryl.Weinberg@capmetro.orgmailto:Daryl.Weinberg@capmetro.org |
---|
w: capmetro.orghttp://www.capmetro.org/
Here's hoping. Has anyone created the bindings in java?
@scrudden Here's a draft version of Service Changes bindings based on the draft .proto - https://github.com/MobilityData/gtfs-realtime-bindings/pull/58. Please note that the .proto is still subject to change so these bindings may change as well. So, they aren't suitable for production use yet, but should work for prototyping.
Hi GTFS Community,
A decision has to be made regarding service changes, and the feedback of everybody is heavily needed.
Currently, many things cannot be done in real-time, including:
What we currently have on the table are:
So what do we do?
Option 0: Use only the existing formats. Update static GTFS more often.
Will require overhaul of GTFS consumer GTFS pipeline to be able to process GTFS every hour or more.
Option 1: We beef up GTFS-TripUpdates to support the cases A, B, C & D (Cc @stefan)
We already have a proposal for case A (#111 ), we could easily extend the TripUpdate object to handle case B, but no proposal so far to handle case C.
Advantages:
Disadvantages:
Option 2: We keep GTFS-TripUpdates for real-time update only, and use GTFS-ServiceChanges to change schedule data (aka cases A, B, C & D)
This is what I had in mind when I drafted GTFS-ServiceChanges, and this is the current state of the GTFS-ServiceChanges proposal.
Advantage:
Disadvantage
Option 3: Middle ground proposed by Transit (Cc @gcamp & @juanborre)
=> What about case B (trip headsigns, short names…)? Should we extend TripUpdate also for this or not?
(Link to the GTFS-ServiceChanges proposal: bit.ly/gtfs-service-changes)