google / transit

https://gtfs.org/
Apache License 2.0
618 stars 183 forks source link

OccupancyStatus vs. occupancy_percentage #223

Closed barbeau closed 2 years ago

barbeau commented 4 years ago

In GTFS-RT, we now have two experimental fields that can contain real-time vehicle occupancy information:

Unless there is a specific use case to have both fields in the spec, we should try to narrow down to one.

Producers and consumers - if you were to pick one of the two, which would you pick and why?

For producers - is one field easier to produce than the other? Would adoption be faster with one field in particular?

skinkie commented 4 years ago

If it would be one or the other; To be compatible with SIRI I would pick the enumeration. In in addition, it is the least commercial sensitive information to be shared.

darylweinberg commented 4 years ago

At Capital Metro we are working to produce OccupanyStatus by August. It will not have a percentage value at this time.

stevenmwhite commented 4 years ago

occupancy_percentage would be the pick.

What it doesn't do is give any sort of accessibility information. While occupancyStatus can be unclear, the mention of "seats" can be helpful to riders if it is accurate, and a pure percentage doesn't give that.

mike-swiftly commented 4 years ago

I don't believe these fields are mutually exclusive. Both would be optional, so clients will be able to handle data being available in none, one, or both of the fields. I expect that more agencies will be able to handle the enumeration because it is less information. But really don't want to preclude applications that can use more detail, like occupancy percentage.

stevenmwhite commented 4 years ago

I don't believe these fields are mutually exclusive. Both would be optional, so clients will be able to handle data being available in none, one, or both of the fields.

I agree. I wouldn't necessarily advocate for deprecating one. My comments above should be taken in the context of if we had to pick one or the other.

mcossette1 commented 4 years ago

I don't believe these fields are mutually exclusive. Both would be optional, so clients will be able to handle data being available in none, one, or both of the fields.

I agree. I wouldn't necessarily advocate for deprecating one. My comments above should be taken in the context of if we had to pick one or the other.

I agree too.

BodoMinea commented 4 years ago

I agree with most of the points above. While this is probably not the outcome desired by feed spec maintainers, as I understand that simplicity is key, I think that both of these fields could be of use to some.

While the enumeration is simplified enough to not give away data of commercial value, for some agencies that are just starting to implement this system, a percentage may be easier to implement, by averaging the data from sensors for example - vehicle cabin weight scales or volumetric cameras.

I also think that applications that consume data may have interest in having both fields exposed - the enumeration is easy to implement as a simple color highlight/people icon, the percentage may be easier to manipulate in a mathematics formula for weighting routing options and also for presenting more exact data to the user.

I would be fine with both of them being optional and given the available resources at a client, I would do my best to provide both.

antrim commented 4 years ago

If we keep both of these fields, it might create opportunities for confusion or unnecessary spec bloat. I worry we’d take a situation with existing ambiguity and just layer in more ambiguity and complexity:

As part of this exercise, I ask what would be lost (if anything) by just having one field?

skinkie commented 4 years ago

@antrim I really liked how @gcamp has defined the percentage based on the safety limit of the vehicle. For me the enumeration would be just the textual description, presented to as translatable text to the end user. It would obviously be nice to have a ballpark as mapping to have a GTFS-RT validator to invalidate 10% to FULL.

Our percentage / enum approach is mainly targeting busses and taxi's. But for a train with different compartments the values greatly variate. We have not modeled that yet. Neither did we do the mobility impaired space, which is even deeper level of detail than the current two approaches. I think after the VehicleTypes GTFS stuff, more questions will come done. Some train builders are already able to tell exact numbers of free seats. Similar to the number of people in the standing space.

juanborre commented 4 years ago

Another point for the enum is that it is the only way a producer can specify if a passenger is likely to find a seat. The mapping between percentage and enum will greatly depend on the vehicle and its seating capacity.

Getting rid of that field would remove that functionality.

However, if the only reason for the enum was to add information about seating, it could be replaced with another field to address that need in particular.

caywood commented 4 years ago

@barbeau I'm not trying to derail the discussion, but there doesn't seem to be clear agreement. Perhaps because we're too tied to history and not adjusting the spec enough to address today's crisis. Given there are few actual implementations of the spec we should choose whatever serves today's needs best.

Within GTFS-RT VehiclePositions:

passenger_count_estimate recognizes that this is a noisy measurement - in addition, agencies can use a discrete set of values or add random noise if especially privacy sensitive.

seating_capacity is a physical property of the vehicle. (I expect any producers currently producing OccupancyStatus must already know this).

max_capacity is a policy decision - normally max_capacity >= seating_capacity but in COVID, max_capacity is whatever the agency says it is.

This would allow consumers to reconstruct any OccupancyStatus or occupancy_percentage they care to represent, with greater precision if needed.

(seating_capacity and max_capacity are part of GTFS-vehicles full but I don't expect that will be implemented on the relevant timeline for the current problem - we need a solution within GTFS-RT VehiclePositions).

Apologies if you've already considered alternatives like this - I'm just processing all of this after joining last week's call.

skinkie commented 4 years ago

@caywood I agree on the passenger_count_estimate (and relevant mobility impaired information) but having static data in a realtime data stream, because we fail to standardise the static data is a direction I don't wish to follow either.

barbeau commented 4 years ago

@caywood Thanks for the feedback! Sure, additional ideas are welcome too if the current fields don't meet certain use cases. Our goal is to get to something that meets the producers and consumers needs.

Do any other producers have comments on the fields proposed in https://github.com/google/transit/issues/223#issuecomment-627099138, vs the existing experimental occupancy fields in GTFS-RT?

mike-swiftly commented 4 years ago

I believe that seating_capacity and max_capacity should not be included in the real-time feed because as @skinkie stated, that is a static data issue. Even if it were to be added to the real-time spec would first need to figure out how to define it. I think the first question is whether it should it go into GTFS.

skinkie commented 4 years ago

@mike-swiftly I think a proper discussion should follow about a standard that is far more low level than travel information. I have not yet reviewed ITxPT, but that might have defined such exchanges.

mike-swiftly commented 4 years ago

Here is another topic for discussion with respect to occupancyStatus and occupancy_percentage: should they also be included with predictions in the TripUpdates feed?

The benefit would be that a client, like an electronic sign system, would then not need to read both TripUpdate and VehiclePositions feed and combine the info. Instead, it would be very easy to display a prediction like "5 minutes - Many Seats Available".

A counter argument could be that displaying vehicle count when a bus is 20 minutes away is irrelevant, that the count could change greatly by the time the bus arrives. But I think that the client should decide what information to display, and that the real-time information system should simply provide possibly useful information to the client.

skinkie commented 4 years ago

But I think that the client should decide what information to display, and that the real-time information system should simply provide possibly useful information to the client.

You basically suggest application specific information profiles, which SIRI uses extensively.

caywood commented 4 years ago

@mike-swiftly My only concern about adding seating_capacity and max_capacity to GTFS would be whether agencies know capacity of the vehicle serving each trip ahead of time. If you (and other transit operations experts) say that would work, I agree it would be better in GTFS as static data.

PS if the primary concern is how easy is it to write a client - then we should definitely put seating_capacity and max_capacity in the VehiclePositions feed so consumers don't have to merge GTFS with GTFS-RT VehiclePositions! Also let's get rid of protocol buffers :)

barbeau commented 4 years ago

Here is another topic for discussion with respect to occupancyStatus and occupancy_percentage: should they also be included with predictions in the TripUpdates feed?

@mike-swiftly I'd prefer to have TripUpdates specifically contain predictive occupancy information within StopTimeUpdates. As you imply, if an agency knows historical on/offs at specific stops, the current occupancy could likely differ significantly from the expected occupancy in a few stops (e.g., stops near the end of the line).

Similarly, I'd prefer to see observed schedule_deviation (how the vehicle's departure time compared to the schedule at the last stop it passed) be stored in VehiclePositions and TripUpdates contain only predictions (what the vehicle is expected to do in the future).

IMHO consumers would benefit from a clearer delineation between observed facts about the current state of the system and predictions about what may happen in the future.

caywood commented 4 years ago

Updated proposal based on feedback from @mike-swiftly and @skinkie

Add to GTFS-RT VehiclePosition:

Add to GTFS trips.txt

(note trips.txt already includes wheelchair_accessible and bikes_allowed which are analogous vehicle properties)

darylweinberg commented 4 years ago

I understand the value of seating_capacity as a physical property of the vehicle in GTFS trips.txt but we do not use the same series of vehicle to run the same trip every day. Some blocks will have restrictions on vehicle size so those will stay fairly consistent but others will not.

As an example, our Rapid service is operated by both 60 ft articulated buses and 40 ft and those assignments vary. There have also been times that we have not had enough vehicles available and have had to supplement the Rapid fleet with with other 35 ft or 40 ft buses.

As such, seating_capacity in trips.txt would be more of a guideline than actual seating capacity for a given trip on a given day.

skinkie commented 4 years ago

@darylweinberg I would envision the option to communicate the Vehicle (and thus VehicleType) in realtime data. Hence overriding the values provided in the schedule.

skinkie commented 4 years ago

Add to GTFS trips.txt

Would you really want to add such this to trips, opposed to a VehicleType for example?

stevenmwhite commented 4 years ago

In our experience it’s not necessarily common for vehicles of the exact same capacity to run the same trips regularly. Even if it’s the same physical size vehicle (which it’s not always), most of our agency customers have fleets with a wide range of years and different model years have slightly different seating capacities and total capacities.

Other notes about this new concept of giving capacity and actual and then letting consumers figure out the percentage...

caywood commented 4 years ago

@stevenmwhite the potential use of different vehicles is why I originally put capacity in GTFS-RT rather than trips.txt. If we need to allow realtime overrides - is it still useful to have it in the GTFS trips.txt?

I'm trying to understand the privacy concern but maybe I'm missing something. Except in the very rare case where you have 0 passengers and then go to 1 then back to 0, you can't identify an individual passenger. (The proposal for occupancy_percentage had debate without clear guidance on how to preserve privacy either - if you have 2.5% occupancy of a 40 seat vehicle, you know what the count is). @gcamp can you explain what you were trying to accomplish for privacy?

stevenmwhite commented 4 years ago

the potential use of different vehicles is why I originally put capacity in GTFS-RT rather than trips.txt. If we need to allow realtime overrides - is it still useful to have it in the GTFS trips.txt?

I don’t see the value in having it in the static. While a particular vehicle’s capacity is static, a trip’s isn’t. And even if it was I don’t see much value for a trip planner because it’s the real-time load that matters.

I think of load certainly as a real-time thing, and “the capacity of the vehicle running this trip currently” makes more sense as a RT aspect of a trip than a Static one to me — until Vehicles is widely adopted and could be referenced.

mike-swiftly commented 4 years ago

Vehicle capacity could perhaps be included in static info, but definitely not in trips.txt since as others have stated capacity varies. It is not static with respect to trips.

But vehicle capacity could potentially be supplied in a separate and new GTFS file, perhaps called vehicles.txt. That info could include not just passenger capacity but other amenities such as bike racks, USB charging, WiFi, etc. I think this could be a good change to consider, but should be considered separately from occupancy status.

botanize commented 4 years ago

I strongly prefer OccupancyStatus if there must be only one. But I think the two options have different uses, and should co-exist.

I believe occupancy_percentage causes confusion about what the occupancy is relative to and suffers from a false sense of precision. Furthermore, I think that the many levels of the enum provide agencies a way to best communicate loads as they see fit, whether that's using all levels the way CapMetro plans to, or choosing the three or four that meet their needs. I do not think consumers should attempt to translate OccupancyStatus to a percentage for display, as that changes the meaning of the field.

People don't know how many seats are on our vehicles, and may not be aware of current seating restrictions, so 80% means nothing, or it's confusing because they don't know if it's 80% of the max capacity or COVID-19 capacity. You might think 80% means 8 people, you might think it means 36 people or even 48 people (if you think of capacity as COVID restricted 10, seated capacity of 40 or standing capacity of 60 people on a 40 foot bus). Depending on your interpretation you'd think there was room for 2, 24 or 12 people, hardly actionable information.

occupancy_percentage also suffers from implied precision. Our uncertainty about realtime load is high, so even if we do implement occupancy_percentage we'd use large bands of 20 points or more. But if you see 80%, you don't know that we only report in 20 point increments.

I think the descriptive messages are much more clear because they aren't in the context of an unknown and possibly changing standard. I actually think "FULL" and "few seats available" are perfect descriptions of what the current vehicle load means to me. It's actionable information. Full means there's no space for me, whether that's because the bus can't fit another person or because there's social distancing restrictions on space. Few seats available means few seats available to me, regardless of how many are empty, I don't need to know the capacity of the vehicle, or what restrictions are in place, because the descriptive text tells me approximately what's available to me, it's not relative to some unknown or changing standard the way occupancy_percentage is relative to either the seating capacity, crush load capacity or restricted capacity of a vehicle whose capacity I don't know anyway.

antrim commented 4 years ago

Hi all -

I’ve attempted to summarize the positions in this thread and draw a few conclusions. Please review and respond!

Would prefer OccupancyStatus (if we chose one field):

Would prefer occupancy_percentage (if we chose one field):

4 people said the fields serve separate purposes and should not be mutually exclusive:

Note there was recently strong support for adding occupancy_percentage as an experimental field (#213).

Reasons in favor of occupancy_percentage:

Reasons in favor of OccupancyStatus:

Current use in consuming applications

Both Transit and Google Maps are using a combination of messages and icons to show occupancy [below]. Would occupancy_percentage be more useful to determine what icons to show:

Google Maps in Sydney - People icon: Google Maps in sydney

Transit in NYC - Subtle green shading indicates occupancy: Transit in NYC

Concept: Adding capacity information

Matt Caywood (@caywood ) started a discussion of how to describe vehicle capacity to give more context for occupancy data. If occupancy was in the context of capacity (both seating + standing), that would give applications a lot of flexibility for presentation in the UI, but that would be a lot to add to a transit data pipeline. Perhaps this is something to add over time, in coordination with static extensions like GTFS-Vehicles?

Here are some conclusions

Please post if you see holes in these (or post if you agree and they make sense to you 😀 ).

caywood commented 4 years ago

Thanks @antrim for a solid summary.

Please don't even consider data consumers in this decision process, we will be fine - companies like TransitScreen (and even Google) can move as fast as necessary once the data exists.

For me, the paramount concern has always been data producers - particularly the 99% of agencies who currently don't have any occupancy data - we need them to start reporting something if this is going to impact the present transit crisis. So we really need to hear from agencies who are not yet producing data, or are currently implementing it.

passenger_count_estimate plus max_capacity has the advantage that it's very straightforward to implement provided the correct data is in the pipeline (like APCs and vehicle capacities) - this is I think the same data you would need to produce either OccupancyStatus or occupancy_percentage, just in the most straightforward format possible.

occupancy_percentage is pretty straightforward assuming we specify it's a percentage of the maximum occupancy per current agency regulations. (However if that changes over time, that creates a minor burden for producers.)

The complexity of OccupancyStatus is its downfall - every agency is going to have to have meetings to figure out what "Standing Room Only" means in COVID, and normally, etc.. Compatibility with SIRI is nice to have but not at the expense of getting things done at a critical time.

stevenmwhite commented 4 years ago

@antrim, I would second @caywood's thanks for the great summary, as well as his focus on producers (I am a producer so take that with a grain of salt, if you will -- but I'm also a consumer [mostly of feeds I produce]).

I think your conclusions are all reasonable and accurate.

I think Percentage is the most useful of them all, and you've clarified how Status can be useful as well (in particular, "to show messages to passengers"). Further clarifying those messages is a good idea and while using SIRI as a model (rather an always trying to reinvent the wheel) is a good idea, I wouldn't say that compatibility with SIRI should be a goal on its own. That being said, I would be in favor of a smaller set of enumerations as you've listed.

esteveavi commented 4 years ago

Hi, in this COVID de-escalation phase authorities are recommending a 50% max occupation per trip. How would you solve that using OccupancyStatus enumeration? By using occupancy_percentage it seems easy. Thank you.

antrim commented 4 years ago

@esteveavi: Under the framework above, operators would use OccupancyStatus messages that are meaningful to passengers during Covid: SEATS_AVAILABLE and FULL. Seat availability would be according to current operating rules.

Question: Is/would FEW SEATS AVAILABLE and MANY SEATS AVAILABLE be a useful and meaningful distinction for passengers?

esteveavi commented 4 years ago

Thanks, @antrim. In this de-escalation, a 50% occupation with FEW_SEATS_AVAILABLE can be considered full.

willwong430 commented 4 years ago

The MTA likes the idea of using occupancy_percentage as the primary indicator for our realtime APC feed. We should be able to easily adjust our enumeration according to what the community deems most effective in defining the degree of passenger occupancy.

darylweinberg commented 4 years ago

CapMetro thinks there are uses for both fields. If we had to pick one it would be Occupancy Status but would prefer to have both available.

We agree with reducing the number of enumerated values but at the moment we are working with our consuming partners to do that when displaying to customers, not on the back end where it is produced.

Thank you,

[http://www.capmetro.org/email/eSigLogo.png]http://www.capmetro.org/ Daryl Weinberg Transit Systems Architect [http://www.capmetro.org/email/sig_fb.png]https://www.facebook.com/capitalmetro?_rdr=p [http://www.capmetro.org/email/sig_tw.png] https://twitter.com/CapMetroATX o: 512-369-6216 e: Daryl.Weinberg@capmetro.orgmailto:Daryl.Weinberg@capmetro.org

w: capmetro.orghttp://www.capmetro.org/

From: William Wong notifications@github.com Sent: Friday, May 22, 2020 10:03 AM To: google/transit transit@noreply.github.com Cc: Weinberg, Daryl Daryl.Weinberg@capmetro.org; Mention mention@noreply.github.com Subject: Re: [google/transit] OccupancyStatus vs. occupancy_percentage (#223)

EXTERNAL E-MAIL

The MTA likes the idea of using occupancy_percentage as the primary indicator for our realtime APC feed. We should be able to easily adjust our enumeration according to what the community deems most effective in defining the degree of passenger occupancy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/google/transit/issues/223#issuecomment-632740025, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZNIRWZBHFBWJNB4QLWU63RS2H3JANCNFSM4M2PSH5Q.

esteveavi commented 4 years ago

In TMB (Barcelona, Spain) we are using occupancy_percentage to show occupation: Here is an example: https://www.tmb.cat/en/barcelona/metro/-/lineametro/L5 We are using 5 levels now to test how it works.

antrim commented 4 years ago

@esteveavi : Since it appears TBM is expressing average historical occupancy (rather than real-time), one resource you may find useful is a draft spec called GTFS-Occupancies (http://bit.ly/gtfs-occupancies) which MobilityData circulated last year. To my knowledge it has not been implemented. We would be interested in your comments. Note that the current draft expresses average occupancy on a trip/stop level, not a route/stop level.

paulswartz commented 4 years ago

After a lot of internal discussion, @mbta is going with 3 occupancy buckets in occupancy_status instead of occupancy_percentage. The exact buckets, and what status they map to, are still being discussed.

Dave-TfNSW commented 4 years ago

I think Transport for NSW would prefer to stick occupancy status rather than percentage, as it is “less precise and more accurate” as has been stated in the thread, and allows for clearer customer messaging. We do not see them as mutually exclusive however. It may even be helpful to have both for communicating changing COVID thresholds. For example we recently changed our highest level for buses from 100% of seats to 31%, but consumers would only know this from our direct communications or scrutiny of our documentation.

hfonbouze commented 4 years ago

Here’s what we think at Ineo Systrans (supplier of CAD-AVL systems). The percentage is not easily understandable for the end user (passenger), especially for high values (what’s mostly unclear is : percentage of what?). It has to be translated into a category of riding condition (i.e. a status). The information should also be consistent between the different channels. It may use different renderings but the meaning has to be the same. Which applies to apps using GTFS-RT but also to information given through other means (displays at stop, dispatchers’ screens…). Therefore the translation (from percentage to status) should be done as upstream as possible, which means at producer’s level. For these reasons, we (as producers) prefer to use the occupancy_status.

skinkie commented 4 years ago

@hfonbouze in the past it has been discussed frequently that some CAD-AVL systems in the USA are using GTFS-RT as backhaul information exchange. What is the value or domain that your implementation uses from vehicle to dispatch?

hfonbouze commented 4 years ago

@skinkie Our standard system (which includes both vehicle equipments and dispatch) doesn't use GTFS-RT for internal exchanges (but a proprietary interface).

skinkie commented 4 years ago

@skinkie Our standard system (which includes both vehicle equipments and dispatch) doesn't use GTFS-RT for internal exchanges (but a proprietary interface).

Still the same question applies, are you using discrete values, an enumeration, or a percentage in this proprietary interface.

hfonbouze commented 4 years ago

@skinkie Presently, it's a percentage. We're considering adding an information 'full' (will take no more passengers) given by the driver.

CleverDevices-gtfs commented 4 years ago

We had an internal discussion about this at Clever Devices. We believe just using Occupancy Status is best because: • As a GTFS-RT producer, the occupancy status allows an agency’s policy to communicate how full the vehicle is instead of through raw numbers. • A load percentage may be confusing for the customer and could vary based on the type of vehicle assigned to the route. APC sensor quality and type will affect how accurately the load is reported; in ridership reporting packages this quality is handled by sampling many boardings over a statistically significant number of trips and discarding outlier trips and blocks, which can not be done in real time and there is no extra data in the standard to account for variability.

osmaa commented 4 years ago

Has anyone considered the question of encoding occupancy on a train car / compartment level? The simplest approach would be to extend OccupancyStatus to a many cardinality field, though that would leave it to convention to interpret ordering etc.

skinkie commented 4 years ago

Has anyone considered the question of encoding occupancy on a train car / compartment level? The simplest approach would be to extend OccupancyStatus to a many cardinality field, though that would leave it to convention to interpret ordering etc.

https://github.com/google/transit/issues/223#issuecomment-625498601

barbeau commented 4 years ago

Has anyone considered the question of encoding occupancy on a train car / compartment level? The simplest approach would be to extend OccupancyStatus to a many cardinality field, though that would leave it to convention to interpret ordering etc.

@osmaa See the draft GTFS-Vehicles spec for a thorough example of this.

kronster commented 4 years ago

There is higher value in the enumeration. Showing the percentage unnecessarily exposes rider safety concerns. The same applies to the public display of the block_id with driver safety.