google / transit

https://gtfs.org/
Apache License 2.0
589 stars 181 forks source link

[GTFS-Flex] Replace areas.txt/stop_areas.txt with locations.geojson MultiPoint feature to describe collections of stops? #398

Closed westontrillium closed 6 months ago

westontrillium commented 1 year ago

The use case of completely on-demand stops, also known as "point deviation"–routes with a collection of stops as the service area that a rider can be picked up/dropped off at in any order within a timeframe–is currently covered with a stop_times.stop_id referencing an area_id containing multiple stops.

Transit (consumes Flex), has expressed concern over the use of areas.txt/stop_areas.txt for Flex services due to the potential of deep loops between stops and stop_areas.txt references to add unnecessary complexity. They have proposed the alternative of including MultiPoints as a possible locations.geojson feature to describe collections of stops.

Are there any concerns with such a change? Questions I have are:

  1. Is a MultiPoint feature containing information duplicative of a stop(s) in stops.txt problematic (i.e., a stop may be described in two different places, but with a different primary key)?
  2. If a MultiPoint feature referred to geolocations already referenced in stops.txt, would that cause complications with consumers parsing which trips to return?
  3. If question 1 or 2 is the case, is there some needed id relationship to stops.txt, akin to what @e-lo has brought up in past conversations?
  4. Is cataloguing points/stops in a file format other than .csv opening pandoras box even more?

The two other use cases for including areas.txt/stop_areas.txt in Flex data are discussed in this issue; there I posit that these can be covered without areas.txt/stop_areas.txt.

hannesj commented 1 year ago

I'm not that concerned with trip planning aspects, but more so with other aspects of passenger information, such as departure boards, online/printed timetables etc, which rely on a single stop id being used cosistently. I think that each physical stop should be represented with an unique identifier inside a feed, and its usage should be mandated everywhere.

I do not understand the concern of deep loops, as at least with location groups, as they were called previously, could only reference either stops or locations, not other location groups. That way there could never be any nesting. I'm not sure if this was overlooked when going from using separate groups to the areas.txt from fares-v2.

An example stop that would be affected is https://www.ostgotatrafiken.se/hallplats/bergslagstorget which is served by regular lines 181 and 182, as fell as flexible services FI01-FI08, which all cover an area and up to five stops.

westontrillium commented 1 year ago

Someone from Transit should confirm. I believe part of their concern was situations where "stopB" is doing all of this:

Screenshot 2023-08-16 at 12 25 27 PM

@hannesj Do you think it could be worth discussing the possibility of reverting back to something (exactly?) like location_groups.txt to describe stop collections, minus polygons? The original justification for switching to areas.txt/stop_areas.txt to describe groups of flexible zones/stops was that those files were already part of Fares v2 and offered the same functionality as location_groups.txt.

leonardehrenfried commented 1 year ago

But doesn't that diagramm just show a well connected graph? Sure, you can do silly things that will be hard/impossible to compute but isn't that the responsibility of the producer?

If you have stop areas that may make computing fares ambigious, should you not create a separate area one just for the flex service?

Edit: Now that I said it "shifting responsibility to the producer" is probably asking for trouble.

westontrillium commented 1 year ago

From a producer point of view, the geojson feature alternative gives me pause just because we'd also need to deal with points that exist in stops.txt and geojson features. I'm also not sure we want to start representing stops data in a file other than stops.txt.

It would be possible to use GeometryCollections instead of MultiPoint to allow for each point of a collection of stops to refer to a stop_id to capture metadata like stop_name/code, but then you're still having to refer to several files (stop_times>locations.geojson>stops versus stop_times>areas>stop_areas*>stops).

Instead of a foreign key relationship, you could just add stop_name/stop_code fields to each feature in the GeometryCollection, but it just seems strange to me to reconstruct data that already exists elsewhere instead of just referencing it.

Either of these solutions are more burdensome for a producer than what is already in the spec.

*A location_groups equivalent would be one less step, for what it's worth.

npaun commented 1 year ago

Let me try to draw @westontrillium's diagram in separate stages, to illustrate our concern with the current implementation of location groups and polygonal stops.

Existing GTFS features

Screenshot 2023-08-22 at 11 47 43 AM

stop_areas.stop_id and stop_times.stop_id are foreign keys referencing stops.txt, and one would have the expectation that all fields named stop_id relate to to stops.txt in some way.

Current state of GTFS Flex proposal

Screenshot 2023-08-22 at 12 13 09 PM

Things have gotten a bit complicated:

Also, we've duplicated certain fields like stops.stop_name and a Feature's stop_name.

Transit's proposal

Screenshot 2023-08-22 at 11 55 07 AM

Outstanding issues

westontrillium commented 1 year ago

I understand the desire to simplify referencing, but I really do not like idea of needing to maintain identical data for single stops in two different places (locations.geojson and stops.txt). Thinking of some alternatives to weigh this against...

Just triple checking an assumption I've had, is there really no precedent for changing a column in the spec from "Required" to "Conditionally Required", or is that truly not considered a backwards-compatible change? Flex already changes the Conditional Requirement of arrival_time: "- Required for the first and last stop in a trip (defined by stop_times.stop_sequence)"...

Because if we could do that to stop_times.stop_id, that could solve the issue of it referencing stop_id, location id, or area_id. Instead, we could have new columns in stop_times for directly referencing a location id or area_id (location_group_id, or whatever), and that record could exclude the now conditionally required stop_times.stop_id. This is what I believe @flocsy touched on in reviewing the Flex PR.

I hesitate to include this, as it's thinking waaay outside the box (I'm trying everything here!), but if we can't get around the required stop_times.stop_id, would it be possible to add an "array" type column to stops.txt to have a stops.txt record reference multiple stop_ids as a "stop group?" The individual column could have its "arrayed" values separated by a space, a pipe, or even be in a JSON-like bracketed array format. So you would have in stops.txt:

stop_id location_type stop_group_array
group1 5 [or 6] stopA stopB stopD stopG
Then in stop_times.txt: trip_id stop_id stop_sequence
weekday group1 1

Fully acknowledging this is highly unorthodox and likely an impossibility. At the very least, it was a good thought exercise for me :)

npaun commented 1 year ago

Those are both interesting ideas, @westontrillium.

stop_times.area_id

Off the top of my head, I think this would be a valid approach. I'd need to think about this more with my team though.

location_type=6

We can mechanically transform stop_group_array into a single-valued column (see the diagram below), if we want to avoid introducing new data types. How do you feel about the result --is it something worth thinking about further?

Screenshot 2023-08-22 at 4 11 24 PM
bdferris-v2 commented 1 year ago

@westontrillium to your point, I think there is plenty of precedent for making changes to the spec that generally break backwards compatibility for feed consumers (as opposed to feed producers), though of course, we try to avoid it if we can. But GTFS-Flex is going to be one giant breaking change for feed consumers no matter how you slice it :)

To your specific point, there is precedent for changing an existing Required field to "Conditionally Required" if we can define a reasonable condition. And having an area_id or location_id specified could reasonably be that condition.

leonardehrenfried commented 1 year ago

I like the direction that this discussion is taking. I've long felt that the foreign key relationships (or lack thereof) where not as "tight" as they could be, but couldn't actually put my finger on it. Keeping all data other than the actual geometry in stops.txt is a good move to me.

I think new location_types are a good idea and thought about this previously. I don't think it's technically required but it will probably be very useful for consumers who have never heard of flex and gives them something to Google.

However, I think that using MultiPoints or any other collection types in locations.geojson is a bad idea. So is introducing a a collection column.

If you don't want to lump stop_areas and location groups together I would prefer going back to an explicit location_groups.txt.

westontrillium commented 1 year ago

At this point, it looks like we're discussing two distinct options, yes?

  1. Add stop_times.location_id and stop_times.location*_group_id columns, make stop_times.stop_id conditionally required. stop_times.txt still directly references locations.geojson and location_groups.txt, but each have their own foreign key column in stop_times.txt.
  2. Add a stops.location_type=5 and stops.location_type=6 for GeoJSON Polygons/MultiPolygons and location groups, respectively, add stops.location_id and stops.location_group_id columns which references an associated locations.geojson or location_groups.txt value. stop_times.stop_id can reference a stop_id that in turn references a location_id or location_group_id.

As a producer, I prefer Option 1, as it is much simpler to implement. Option 2 has more reference steps, requires creating more data (since you'd need to generate a stops.txt entry for each location/location group), and is more burdensome to maintain longterm due to the requirement of sustaining parity between a record's metadata in stops.txt and its locations.geojson/location_group data. These issues would be compounded for smaller producers.

*If this is amended to only refer to stops, should the name change to something else, or do we leave the potential to be able to include other location types later...?

npaun commented 1 year ago

We prefer Option 2, because it ensures that the linkage to locations.geojson and location_groups.txt only exists in one place, stops.txt. With Option 1, we'd need to add these linkages to at least stop_times.txt and stop_areas.txt, with more cases theoretically possible in the future.

Option 2 also keeps metadata, such as the stop name, in a single place for all of regular stops, entrances, stations, generic nodes, boarding areas, polygon stops and location groups.

That said, we think both Option 1 and Option 2 represent a significant improvement over the status quo.

isabelle-dr commented 6 months ago

After a discussion with @tzujenchanmbd, I am closing this issue, it has been included in #433