Closed westontrillium closed 6 months ago
I'm not that concerned with trip planning aspects, but more so with other aspects of passenger information, such as departure boards, online/printed timetables etc, which rely on a single stop id being used cosistently. I think that each physical stop should be represented with an unique identifier inside a feed, and its usage should be mandated everywhere.
I do not understand the concern of deep loops, as at least with location groups, as they were called previously, could only reference either stops or locations, not other location groups. That way there could never be any nesting. I'm not sure if this was overlooked when going from using separate groups to the areas.txt from fares-v2.
An example stop that would be affected is https://www.ostgotatrafiken.se/hallplats/bergslagstorget which is served by regular lines 181 and 182, as fell as flexible services FI01-FI08, which all cover an area and up to five stops.
Someone from Transit should confirm. I believe part of their concern was situations where "stopB" is doing all of this:
@hannesj Do you think it could be worth discussing the possibility of reverting back to something (exactly?) like location_groups.txt to describe stop collections, minus polygons? The original justification for switching to areas.txt/stop_areas.txt to describe groups of flexible zones/stops was that those files were already part of Fares v2 and offered the same functionality as location_groups.txt.
But doesn't that diagramm just show a well connected graph? Sure, you can do silly things that will be hard/impossible to compute but isn't that the responsibility of the producer?
If you have stop areas that may make computing fares ambigious, should you not create a separate area one just for the flex service?
Edit: Now that I said it "shifting responsibility to the producer" is probably asking for trouble.
From a producer point of view, the geojson feature alternative gives me pause just because we'd also need to deal with points that exist in stops.txt and geojson features. I'm also not sure we want to start representing stops data in a file other than stops.txt.
It would be possible to use GeometryCollections instead of MultiPoint to allow for each point of a collection of stops to refer to a stop_id to capture metadata like stop_name/code, but then you're still having to refer to several files (stop_times>locations.geojson>stops versus stop_times>areas>stop_areas*>stops).
Instead of a foreign key relationship, you could just add stop_name/stop_code fields to each feature in the GeometryCollection, but it just seems strange to me to reconstruct data that already exists elsewhere instead of just referencing it.
Either of these solutions are more burdensome for a producer than what is already in the spec.
*A location_groups equivalent would be one less step, for what it's worth.
Let me try to draw @westontrillium's diagram in separate stages, to illustrate our concern with the current implementation of location groups and polygonal stops.
stop_areas.stop_id
and stop_times.stop_id
are foreign keys referencing stops.txt
, and one would have the expectation that all fields named stop_id
relate to to stops.txt
in some way.
Things have gotten a bit complicated:
stop_times.stop_id
is now a special type of key referring to either stops.stop_id
or stop_areas.area_id
or the id
of a Feature
in locations.geojson
.stop_areas.stop_id
now refers to either stops.stop_id
or a Feature's id
. (Maybe it could also refer to stop_areas.area_id
for consistency with stop_times.stop_id
- but now we have a cyclic data structure.... errrh...)stops.parent_station
or transfer.from_stop_id
continue to refer only to stops.txt
? Not sure.Also, we've duplicated certain fields like stops.stop_name
and a Feature's stop_name
.
stops.txt
be preserved. location_type=5
is introduced for Flex areas (final name TBD).
location_type=5
then stops.location_id
is conditionally required, and refers to a Feature's id
.stops.stop_lat
and stops.stop_lon
are conditionally forbidden. These fields are already optional for some location_types so it isn't a breaking change.locations.geojson
, so stops.stop_name
is the only place to name a stop, for example.MultiPolygon
geometry (for service areas), or MultiPoint
goemetry (to replace location groups)MultiPoint
(location groups), it introduces problems of its own: the members of location groups aren't stops
and don't have their own metadata anymore.
GeometryCollections
.I understand the desire to simplify referencing, but I really do not like idea of needing to maintain identical data for single stops in two different places (locations.geojson and stops.txt). Thinking of some alternatives to weigh this against...
Just triple checking an assumption I've had, is there really no precedent for changing a column in the spec from "Required" to "Conditionally Required", or is that truly not considered a backwards-compatible change? Flex already changes the Conditional Requirement of arrival_time
: "- Required for the first and last stop in a trip (defined by stop_times.stop_sequence
)"...
Because if we could do that to stop_times.stop_id
, that could solve the issue of it referencing stop_id
, location id
, or area_id
. Instead, we could have new columns in stop_times for directly referencing a location id
or area_id
(location_group_id
, or whatever), and that record could exclude the now conditionally required stop_times.stop_id
. This is what I believe @flocsy touched on in reviewing the Flex PR.
I hesitate to include this, as it's thinking waaay outside the box (I'm trying everything here!), but if we can't get around the required stop_times.stop_id
, would it be possible to add an "array" type column to stops.txt to have a stops.txt record reference multiple stop_ids as a "stop group?" The individual column could have its "arrayed" values separated by a space, a pipe, or even be in a JSON-like bracketed array format. So you would have in stops.txt:
stop_id | location_type | stop_group_array |
---|---|---|
group1 | 5 [or 6] | stopA stopB stopD stopG |
Then in stop_times.txt: trip_id | stop_id | stop_sequence |
---|---|---|
weekday | group1 | 1 |
Fully acknowledging this is highly unorthodox and likely an impossibility. At the very least, it was a good thought exercise for me :)
Those are both interesting ideas, @westontrillium.
Off the top of my head, I think this would be a valid approach. I'd need to think about this more with my team though.
We can mechanically transform stop_group_array
into a single-valued column (see the diagram below), if we want to avoid introducing new data types. How do you feel about the result --is it something worth thinking about further?
@westontrillium to your point, I think there is plenty of precedent for making changes to the spec that generally break backwards compatibility for feed consumers (as opposed to feed producers), though of course, we try to avoid it if we can. But GTFS-Flex is going to be one giant breaking change for feed consumers no matter how you slice it :)
To your specific point, there is precedent for changing an existing Required field to "Conditionally Required" if we can define a reasonable condition. And having an area_id or location_id specified could reasonably be that condition.
I like the direction that this discussion is taking. I've long felt that the foreign key relationships (or lack thereof) where not as "tight" as they could be, but couldn't actually put my finger on it. Keeping all data other than the actual geometry in stops.txt is a good move to me.
I think new location_types are a good idea and thought about this previously. I don't think it's technically required but it will probably be very useful for consumers who have never heard of flex and gives them something to Google.
However, I think that using MultiPoints or any other collection types in locations.geojson is a bad idea. So is introducing a a collection column.
If you don't want to lump stop_areas and location groups together I would prefer going back to an explicit location_groups.txt.
At this point, it looks like we're discussing two distinct options, yes?
stop_times.location_id
and stop_times.location*_group_id
columns, make stop_times.stop_id
conditionally required. stop_times.txt still directly references locations.geojson and location_groups.txt, but each have their own foreign key column in stop_times.txt.stops.location_type=5
and stops.location_type=6
for GeoJSON Polygons/MultiPolygons and location groups, respectively, add stops.location_id
and stops.location_group_id
columns which references an associated locations.geojson or location_groups.txt value. stop_times.stop_id
can reference a stop_id
that in turn references a location_id
or location_group_id
.As a producer, I prefer Option 1, as it is much simpler to implement. Option 2 has more reference steps, requires creating more data (since you'd need to generate a stops.txt entry for each location/location group), and is more burdensome to maintain longterm due to the requirement of sustaining parity between a record's metadata in stops.txt and its locations.geojson/location_group data. These issues would be compounded for smaller producers.
*If this is amended to only refer to stops, should the name change to something else, or do we leave the potential to be able to include other location types later...?
We prefer Option 2, because it ensures that the linkage to locations.geojson and location_groups.txt only exists in one place, stops.txt
. With Option 1, we'd need to add these linkages to at least stop_times.txt
and stop_areas.txt
, with more cases theoretically possible in the future.
Option 2 also keeps metadata, such as the stop name, in a single place for all of regular stops, entrances, stations, generic nodes, boarding areas, polygon stops and location groups.
That said, we think both Option 1 and Option 2 represent a significant improvement over the status quo.
After a discussion with @tzujenchanmbd, I am closing this issue, it has been included in #433
The use case of completely on-demand stops, also known as "point deviation"–routes with a collection of stops as the service area that a rider can be picked up/dropped off at in any order within a timeframe–is currently covered with a
stop_times.stop_id
referencing anarea_id
containing multiple stops.Transit (consumes Flex), has expressed concern over the use of areas.txt/stop_areas.txt for Flex services due to the potential of deep loops between stops and stop_areas.txt references to add unnecessary complexity. They have proposed the alternative of including MultiPoints as a possible locations.geojson feature to describe collections of stops.
Are there any concerns with such a change? Questions I have are:
The two other use cases for including areas.txt/stop_areas.txt in Flex data are discussed in this issue; there I posit that these can be covered without areas.txt/stop_areas.txt.