GTFS-Fares v2: Add networks.txt & route_networks.txt

tzujenchanmbd commented 11 months ago

Context

The networks.txt and route_networks.txt are derived from Ito World's proposal. These two files provide another way to define a network to which route or multiple routes belong, and enable the naming of the network. Currently, the network is defined in a schedule file - routes.network_id. The following is the proposal for these two files from Ito World:

networks.txt

File: Optional Primary key (network_id)

Field Name Type Presence Description

network_id Unique ID Required Identifies a network. Must be unique in networks.txt.

network_name Text Optional The name of the network as displayed to the rider.

Field Name	Type	Presence	Description
network_id	Unique ID	Required	Identifies a network. Must be unique in networks.txt.
network_name	Text	Optional	The name of the network as displayed to the rider.

route_networks.txt

File: Optional Primary key (*)

Field Name Type Presence Description

network_id Foreign ID referencing networks.network_id Required Identifies a network to which one or multiple route_ids belong.

route_id Foreign ID referencing routes.route_id Required Identifies a route.

Field Name	Type	Presence	Description
network_id	Foreign ID referencing networks.network_id	Required	Identifies a network to which one or multiple route_ids belong.
route_id	Foreign ID referencing routes.route_id	Required	Identifies a route.

Values

Make it possible for producers to create fares datasets without adding information to schedule file, especially useful when fares dataset and schedule dataset are handled by different parties.
Producers can name networks.
It will synchronize the definition of the areas.txt and stop_areas.txt mechanism

Topics/risks can be discussed

Currently a route can belong to only 1 network, do we want to release the many-to-many relationship?
If we add these two files, how do we deal with routes.network_id? Do we want to specify this in spec?
- Is there many routes.as-route implementation in the wild? Does it affect adopting these two files?

Please share any thoughts on route_networks.txt and networks.txt in this issue.

gcamp commented 11 months ago

I disagree with some points and have concerns about others

Make it possible for producers to create fares datasets without adding information to schedule file, especially useful when fares dataset and schedule dataset are handled by different parties.

Should crowding in GTFS-rt should be in a separate file because it's provided by different hardware than GPS location? Should we have a routes_colors.txt because the marketing department has different concerns than the planning department?

I don't think that should be an argument for a separate file. Yes, it does mean that if multiple stakeholder work on the same GTFS they need to coordinate, but that's always been the case.

Producers can name networks.

I agree there could be value there but I think the proposition needs more details on what is it for. Is that only for networks in the context of fares or there's more use cases? What would be expected examples? The previous proposition from MobilityData about Modes and Networks had more context about it.

It will synchronize the definition of the areas.txt and stop_areas.txt mechanism

There's a clear reason the stop <> areas definition is like this, it's because it's a many to many relationship. I don't think that's something we want for networks. If there's a clear case for a many to many relationship for route <> networks, it would be different.

e-lo commented 11 months ago

Should crowding in GTFS-rt should be in a separate file because it's provided by different hardware than GPS location? Should we have a routes_colors.txt because the marketing department has different concerns than the planning department?

I'd actually argue that yes....that would be nice to allow. Adding additional data controlled by different departments to the same archive file is very prohibitive to many agencies and creates a lot of convoluted and internal business processes which prohibit publishing the data at all/in a timely manner.

I like to refer back to this diagram from the awesome paper that many of us participated in developing: Getting the Transit Data that Riders Want

e-lo commented 11 months ago

...but that's always been the case.

And we need to make it easier, not increasingly harder as we add data. History and the status quo is not and should not be the goal. As a data consumer you have a business interest in getting better data...but data producers can't invest in better data w/out it getting easier/better for them to produce it.

skinkie commented 11 months ago

What is the reason agencies at your side of the water are not using Hastus directly to produce GTFS, NeTEx, the input for their AVL etc.?

tzujenchanmbd commented 11 months ago

Here is a summary of the working group discussion on July 25, 2023, regarding this topic. The meeting notes are publicly available here.

Meeting consensus:

The group is generally in favor of the route_networks.txt & networks.txt issue & proposal by Ito World, as it solves an immediate modeling problem for some producers/consumers without affecting other functionality.

Some of the main discussion points:

There have been cases of producers/consumers procuring fares data separately from schedule, given the size and complexity of the feature.
Added functionality would be useful to have if it helps producers/consumers, and could open the door to new modeling options for existing datasets.
This proposal also echoes the approach taken on Fares as a stand alone component.
Networks will be inherently more stable than other data (i.e. fares), it would be nice to update schedule data when updating fares, also data procurement will probably still need to be coordinated for fares and schedule.
The only issue identified so far relies on backwards compatibility, It’s important to define an approach to handle the use of routes.network_id
It was also noted that this change might not necessarily solve far more complex modeling challenges such as those usually present in European markets, and that there’s a risk in pushing features without enough support for implementation as they can create additional “noise” in the spec.

MobilityData proposed this spec changes build upon Ito World’s proposal and attempt to address how to coexist with the existing routes.network_id. It suggests changing presence of routes.network_id to Conditionally Forbidden so that producers need to select either routes.network_id or combination of networks.txt and route_networks.txt. Happy to see any thoughts on this.

tzujenchanmbd commented 11 months ago

Based on the discussions from the previous working group meeting and the concerns raised in Guillaume's comment, we have updated the network-route relationship document to provide more details about the proposed changes with route_networks.txt and networks.txt.

Regarding the following concerns:

I agree there could be value there but I think the proposition needs more details on what is it for. Is that only for networks in the context of fares or there's more use cases? What would be expected examples? The previous proposition from MobilityData about Modes and Networks had more context about it.

-More use cases (please see "Use cases can be supported by networks" section). We referenced the previous GTFS-ModesAndNetworks proposition and outlined 2 potential use cases that networks could support: 1. Filtering trip planner results and 2. Visual display(lines) on map.

There's a clear reason the stop <> areas definition is like this, it's because it's a many to many relationship. I don't think that's something we want for networks. If there's a clear case for a many to many relationship for route <> networks, it would be different.

-We present 2 cases where many-to-many relationship can be useful (please see "Many-to-many relationship examples" section). The one is when a route belongs to multiple networks for multiple use cases, and the other is for fares use case only.

@gcamp Does this solve your concerns?

tsherlockcraig commented 11 months ago

What is the reason agencies at your side of the water are not using Hastus directly to produce GTFS, NeTEx, the input for their AVL etc.?

The answers are complex and sometimes political, but in my experience, usually a mix of 1) operations that silo different transit modes into different systems, 2) difficulty in procurement and contracting systems in a way that encourage interoperability, 3) limited technical resources to enable more rapid change management towards more efficient operations. There is definitely a trend towards scoping the scheduling -> CAD data transfer more consistently (see https://www.interoperablemobility.org/procurement/)

gcamp commented 11 months ago

@gcamp Does this solve your concerns?

Unfortunately not...

The first example shows how fare network and display networks can be different. To me this shows how they are different things and maybe should not be merged into the same concept. Basically, if you have one network for display purpose and one network for fare purposes, it's impossible for consumer to display the networks since it will include a fare-only one.

For what it's worth, Transit has had the "display network" concept for a long time internally for display purposes. We initially had the ability to have multiple networks per route and we dropped that capability when it wasn't used years after the initial implementation.

On the second exemple, the 747 fares could easily be represented by using the many-to-many relationship that already exist in fare_products.txt. That's what we would need to do in this example anyway if we wanted to add the single A zone ticket at 3.75$.

Something like this : fare_products.txt	fare_product_id	fare_product_name
airport	weekly-pass
airport	single-ticket-11-bucks
standard	weekly-pass
standard	single-ticket-375

tzujenchanmbd commented 11 months ago

@gcamp From the 747 example above, it seems this would require modeling the exact same fare product with different fare_product_id (e.g. weekly-pass). Should we avoid this approach? Conceptually, it might be better to use single fare_product_id for all fare products that have the same leg rules. For instance, a fare_product_id "single_ticket" for all single ticket products with various fare media. From the 747 example above, fare products appear to be grouped based on "network" or "where the fare product can be used" ("airport" & "standard"). Should this differentiation be described only in fare_leg_rules.txt rather than fare_products.txt?

Having exact same "real world" fare product under different fare_product_id seems hard to maintain - if the price changes, they need to modify it in multiple places. (If the passes also have different fare media and/or rider category, the number of fare products will increase too, and they all need to be defined under different fare_product_ids)

Regarding different use cases for networks: One advantage mentioned by Ito World in the proposal is that route_networks.txt + networks.txt would have a similar mechanism of stop_areas.txt + areas.txt, which could potentially be used for other use cases in the future. For instance, stop_areas.txt + areas.txt have already been used in the flex proposal, and there is strong consensus within the flex community on this.

Totally agree that distinguishing the scope of networks is necessary; otherwise consumers won't be able to use the appropriate network for specific use case. How about adding an enum field (e.g. network_scope) in networks.txt? For example, 0 or empty - used for fares; 1 - used for display network, and we can extend the enum for new use cases in the future. Regarding the adoption process, since the display network enum falls outside the scope of fares v2, there is no need to formally adopt the display network enum in the fares v2 iteration.

bdferris-v2 commented 10 months ago

A couple of points:

@gcamp argued that modeling display and fare networks in the same file potentially muddies the line between the two. I'd point out that we are considering making a similar trade-off with GTFS-Flex, where stop_areas.txt will theoretically be used to both model flex service areas along with the existing support for fare regions. I don't think this is necessarily a bad thing, as I think the alternative of having fare_stop_areas.txt and flex_stop_areas.txt doesn't seem particularly appetizing. Ultimately, it doesn't bother me too much to have one file for defining collections of routes (aka networks), where the interpretation of that network depends on the reference context.

Regarding the fare modeling example, we don't actually have many-to-many relationship support officially defined in the spec yet, yeah? Right now, the primary key is (fare_product_id, fare_media_id) and doesn't include (fare_product_name), so I'm not sure @gcamp 's modeling proposal is valid without a change to the spec?

I do acknowledge that the case for many-to-many routes <=> networks is not quite as clear cut as it is for stop areas, though I do think the case exists. There is a world where I'd say we start with the existing one network per route relationship, but just move it to the separate file. We could potentially expand it to multi network + multi route in the future if the demand becomes stronger, without breaking existing feeds. But there are potential costs for consumers there in complexity either way.

isabelle-dr commented 10 months ago

Hey everyone - just a reminder that MobilityData is hosting a GTFS-Fares v2 monthly meeting, and we will address the concerns raised in this issue tomorrow at 11 AM EDT. To participate in the meeting, please subscribe via this link.

For a summary and more details, please check out the meeting notes.

isabelle-dr commented 10 months ago

Hello, here is a recap of what was said during the last working group meeting regarding this topic.

There is no immediate need for a many-to-many relationship between routes and networks. If we move forward with this addition, we will mention that a route can’t belong to more than one network.
The group sees value in allowing both routes.network_id and routes_networks.txt + networks.txt in the same dataset, but we need rules to prevent overlaps.
The group sees value in adding this to the spec, although @npaun seeks clarification on the underlying principles guiding this addition. @bdferris-v2 has drafted a document outlining the guiding principles for deciding when to create a distinct file for specific information versus using a field in an existing file. You can find this document here: GTFS Components - Criteria for Independent Publication.

tzujenchanmbd commented 9 months ago

Hello, here is a recap of consensus reached during the working group meeting on September 26th.

The arguments and criteria listed in the GTFS Components - Criteria for Independent Publication document were found to be valid, and the group agreed to add these two files.
There was agreement on not allowing both routes.network_id and routes_networks.txt + networks.txt in the same dataset.

We are going to create a PR for adding these files.

google / transit