google / transit

https://gtfs.org/
Apache License 2.0
580 stars 177 forks source link

Primary Keys for fare_leg_rules.txt and fare_transfer_rules.txt #304

Closed omar-kabbani closed 2 years ago

omar-kabbani commented 2 years ago

Primary Keys for fare_leg_rules.txt and fare_transfer_rules.txt

This GitHub issue was raised in response to a discussion in pull Request #286, and has been prioritized by MobilityData, who is going to work on it with the agenda described below. MobilityData is trying this format to resolve open issues with GTFS extensions - if you have any feedback regarding this format, please reach out to specifications@mobilitydata.org

The need

The specification should enable easy lookup for entries, where unique rows can be located with the least computational resources. The modelling of Primary Keys serves to optimize lookups, and does not impact passenger information.

The issue

All fields in fare_leg_rules.txt and fare_transfer_rules.txt were initially marked as Primary Keys, this is not optimal as it results in high computational load for lookups. A group of stakeholders proposed a smaller subset of these fields to define Primary Keys.

Potential options

Use case 1: Agencies that accept multiple currencies

leg_group_id from_area_id to_area_id network_id currency amount
group_1 area_1 area_2 bus USD 5
group_1 area_1 area_2 bus CAD 7

Use case 2: Travel between the same areas with different networks

from_area_id to_area_id network_id
area_1 area_2 local_bus
area_1 area_2 comfort_bus

Use case 3: Agencies that charge different fees for different transfers between the same leg groups (ex: the first 3 transfers are free but the remaining 2 cost 0.5 CAD)

from_leg_group_id to_leg_group_id spanning_limit duration_limit duration_limit_type fare_transfer_type currency amount
group_1 group_2 3 7200 0 0 CAD 0
group_1 group_2 5 7200 0 0 CAD 0.5

MobilityData's recommendations

Both options 1 and 2 address the needs raised in this issue, however, MobilityData recommends option 2 as it allows for the representation of three use cases that would not be possible with option 1.

Timeframes

2022-01-31: Issue opened on GitHub

Phase 1

From 2022-01-31 to 2022-02-11 (10 business days): Conversations on GitHub with the community.

Phase 2

From 2022-02-14 to 2022-02-18 (5 business days): MobilityData will gather and analyze the feedback provided in phase 1, and will publish a summary by 2022-02-18.

Phase 3

If the community was unable to reach consensus by the end of phase 1, MobilityData will call for three meetings to resolve any outstanding items regarding Primary Keys and fare_leg_rules.is_symmetrical and fare_transfer_rules.is_symmetrical. Feel free to block the below times in your calendars: Meeting 1: 2022-02-22 13:00 to 14:00 ET Meeting 2: 2022-02-23 13:00 to 14:00 ET Meeting 3: 2022-02-24 13:00 to 14:00 ET

Please reach out to specifications@mobilitydata.org or react/reply to this message to receive a meeting invite.

Phase 4

From 2022-02-28 to 2022-03-04 (5 business days): MobilityData will gather and analyze the feedback provided in phase 3, and will publish a summary by 2022-03-04.

Phase 5

If consensus is reached, MobilityData will open the vote on the week of 2022-03-07, otherwise, MobilityData will act as a facilitator to help the community reach consensus through meetings, online conversations, or other appropriate means. The time period for this phase starts on 2022-03-07 and ends on 2022-03-11 (5 business days). MobilityData can be invited to take meetings notes or can be kept in Cc for those exchanges.

Phase 6

From 2022-03-14 to 2022-03-19 (5 business days): MobilityData will gather and analyze the information shared during phase 5, and will publish a summary by 2022-03-19.

Phase 7:

In all cases, the vote will be opened on the week of 2022-03-28. If the vote does not pass, MobilityData will end the work on these issues and will inform the community. This will not prevent other stakeholders from continuing the effort to drive the adoption of the base implementation of GTFS-Fares.

jsteelz commented 2 years ago

Hello, what does fare_variable _rules mean?

omar-kabbani commented 2 years ago

Hey Jeremy - sorry that was a typo, I meant fare_transfer_rules I've edited the initial post to fix it

jsteelz commented 2 years ago

A few preliminary observations:

  1. Use case 3 is invalid; per the PR's proposed specification, spanning_limit follows this rule:

Conditionally Forbidden: Forbidden if fare_transfer_rules.from_leg_group_id does not equal fare_transfer_rules.to_leg_group_id.

  1. Explaining which use cases support which primary_key inclusions, and the options they fall under, would be useful. It seems that use cases 1 and 3 support Option 2, but use case 2 supports Option 1.

  2. If amount is a primary key (which we support for a similar reasoning to yours), then currency should be as well. What stops an agency from accepting the same amount for a particular leg in different currencies?

omar-kabbani commented 2 years ago

Thanks @jsteelz for the quick feedback - please find my responses below

Use case 3 is invalid; per the PR's proposed specification, spanning_limit follows this rule:

Good catch! What if the example is changed to have the same from_leg_group_id and to_leg_group_id. This describes a use case where transfers within the same fare leg group are possible up to five times, with the first three being free, and then the remaining two costing 0.5 CAD each. Would this be a use case worth representing? I have updated the initial message to reflect this example.

Explaining which use cases support which primary_key inclusions, and the options they fall under, would be useful. It seems that use cases 1 and 3 support Option 2, but use case 2 supports Option 1.

There is a bit of overlap, but the point was to show both ends of the spectrum with option 1 being the bare minimum, and option 2 being the expanded option that allows for the representation of most use cases (we could not think of logical use cases that could be represented beyond the ones in option 2). Please find a breakdown by field below:

If amount is a primary key (which we support for a similar reasoning to yours), then currency should be as well. What stops an agency from accepting the same amount for a particular leg in different currencies?

Interesting suggestion, I see your point in amount potentially not identifying unique entries if amount is the same but currency is different. Instead, do you think it's better to have currency as a Primary Key since duplicate entries with the same currency cannot have different amount? Otherwise we can have both amount and currency listed as Primary Keys.

flocsy commented 2 years ago

I don't understand this proposal. It makes not much sense to me.

jsteelz commented 2 years ago
  1. Would we all be in agreement that: (network_id, from_area_id, to_area_id, currency) is a unique key constraint of fare_leg_rules?
  2. For fare_transfer_rules, we are debating between these two potential unique key constraints: a. (from_leg_group_id, to_leg_group_id, currency) b. (from_leg_group_id, to_leg_group_id, spanning_limit, currency)

By unique key constraint, we are referring to a candidate key / minimal set of fields that uniquely identify a row. We would consider the particular choice of primary key and of input and output fields to be implementation-specific.

Regarding the inclusion of spanning_limit

If alternative transfer rules with different spanning_limits are to be permitted, then logically the same should apply for fare_transfer_type, duration_limit and duration_limit_type. We think it would be best if all of them were to be omitted.

flocsy commented 2 years ago

@jsteelz In both 1. and 2. it depends on what we decide about is_symmetrical, that's why I think we should first agree on https://github.com/google/transit/issues/305

  1. The 3rd use case is a clear and totally legitimate (and one that actually exist in some places) so IMHO we have to include spanning_limit in the unique key. I agree that duration_limit and duration_limit_type should also be added to the key.

But I disagree regarding fare_transfer_type: it is definitely an "output" field. If it would be added to the key then the following awkward things would be possible:

from_leg_group_id | to_leg_group_id | spanning_limit | duration_limit | duration_limit_type | fare_transfer_type | currency | amount group_1 | group_2 | 3 | 3600 | 0 | 0 | CAD | 1 group_1 | group_2 | 3 | 3600 | 0 | 1 | CAD | 2 group_1 | group_2 | 3 | 3600 | 0 | 2 | CAD | 3 group_1 | group_2 | 3 | 3600 | 0 | 3 | CAD | 4

What does this mean? Should my fare calculator calculate the fares with all 4 variations and choose the cheapest total fare? I think this makes no sense.

We can use the phrase "unique key constraint", or "primary key" both sound ok to me. I understand why you wrote that, and I think that everyone is free to implement their DB however they desire. It wasn't meant to tell you how you implement your DB. It was to tell a producer of a csv file what constraint the data in the file should adhere. But regardless of what wording we'll choose, I think that for the conversation here what's important is to understand what is the logic by which we decide which fields are included in the unique key and what other constraints we require for some of the fields (things like what behavior of is_symmetrical is allowed, that has to be described separately in the spec, regardless if we include or exclude it from the unique key)

npaun commented 2 years ago

I work with @jsteelz on Transit's implementation of fares.]

@flocsy I agree that including fare_transfer_type in the key would require trying all of the options and picking the lowest cost result and that this isn't very nice to do.

But to me it seems like spanning _limit would require the same unless we add an additional rule like for example "always pick the rule with lowest spanning limit among those that match".

flocsy commented 2 years ago

@npaun regarding the spanning_limit it's not a problem, because when I have the journey, I know this, so my lookup can deal with it (depends on implementation, but possible).

However fare_transfer_type should not be decided by the implementation of the consumer. It should be decided by the producer.

npaun commented 2 years ago

spanning_limit

If spanning_limit is included in the key, we need to add a criterion to the spec in order to decide which fare_transfer_rule to select in the following situation: Consider a journey consisting of 2 busses in fare_leg_group=Downtown.

Suppose fare_transfer_rules.txt contains the following:

from_leg_group_id,to_leg_group_id,spanning_limit,amount,currency
Downtown,Downtown,3,1.00,CAD
Downtown,Downtown,4,0.50,CAD

We could say that _"The rule with the minimum spanning_limit that is greater than or equal to the span of the sub-journey is to be selected"._

Including duration_limit only

We could use a similar tie-breaking rule: _"Minimum duration_limit >= sub-journey duration"_

duration_limit_type

The inclusion of duration_limit_type in the key poses difficulties. Suppose the duration_limit=3600 for several fare_transfer_rules, each with a different duration_limit_type.

0 - Between the departure fare validation of the first leg and the arrival fare validation of the last leg. 1 - Between the departure fare validation of the first leg and the departure fare validation of the last leg. 2 - Between the arrival fare validation of the first leg and the departure fare validation of the last leg. 3 - Between the arrival fare validation of the first leg and the arrival fare validation of the last leg

from_leg_group_id,to_leg_group_id,duration_limit,duration_limit_type,amount,currency
Downtown,Downtown,3600,0,1.00,CAD
Downtown,Downtown,3600,1,2.00,CAD
Downtown,Downtown,3600,2,4.00,CAD
Downtown,Downtown,3600,3,8.00,CAD

I think there's no logical tie-breaker among these rules using only the facts of the journey. For example, suppose I board bus A at 10h, disembark at 10h30. Then I wait 30 minutes, and board bus B at 11h, disembarking at 11h30. duration_limit_type=1 gives 3600s as does duration_limit_type=3

Alternatives

flocsy commented 2 years ago

I think that fare_transfer_type, duration_limit_type should not be part of the key (makes no sense to me). In fact I can hardly contemplate why in any agency would they use more than one type in their rules.

Regarding the spanning_limit and duration_limit: we use similar features in our in-house rules and I can say from experience that neither of the above approaches will work for every producer unfortunately. So much so, that we had to add a way to enable the more expensive rule to win in certain situations. The way we deal with it is that each rule has a priority, and in every journey we only take into account the rules with the highest priority. So by default "the cheapest wins", but in cases when there is a specific case when the more expensive rule has to be the winner we increase it's priority (usually we also need to "copy" some of the other rules with higher priority). What I am trying to say is that it's not easy to find the balance, and I'm pretty sure that whatever we decide there will be some producers that won't be able to represent every case. But I think it's OK for now. Let the need arise and we'll deal with it.

I think the "cheapest wins" should be the guide line, but we need to be specific about it! In my opinion it should be that the cheapest total journey price should win. To give an easy example: on a bus one can either buy a ticket for $5 and travel on 1 leg, or for $7 and it enables 1 free transfer. So on a 2 leg journey it' either: l1: $5, l2: $5 or l1:$7, l2: free-transfer. As you see the 2nd is more optimal ($7<$10).

omar-kabbani commented 2 years ago

From @npaun's proposal above for spanning_limit:

We could say that "The rule with the minimum spanning_limit that is greater than or equal to the span of the sub-journey is to be selected".

How about we break spanning_limit into two fields: start_spanning_limit and end_spanning_limit, which clearly define at which transfer does a certain rule start and end. A blank start_spanning_limit field means the rule applies with the first transfer, and a blank end_spanning_limit field means rule applies up until the last possible transfer (no limit).

We can add a rule that forbids overlapping spanning limits - thoughts?

With that, I think it would be possible to have the Primary Keys as: from_leg_group_id, to_leg_group_id, start_spanning_limit, and currency

We might also need to include a rule that the duration_limit needs to be the same for the same pair of from_leg_group_id and to_leg_group_id. Ex: A transfer is valid for two hours, but the transfer costs might vary within these two hours based on how many transfers a rider makes.

flocsy commented 2 years ago

@omar-kabbani I don't really understand what you are proposing with the start_spanning_limit, end_spanning_limit. Can you give some examples?

Why would you forbid overlapping spanning_limits?

Also: why couldn't a producer have: from_leg_group_id, to_leg_group_id, duration_limit, amount a, b, 7200, $5 a, b, 3600, $3

omar-kabbani commented 2 years ago
@flocsy sure thing! It would look something like this: Consider an example where transfers from bus to subway are valid for two hours, with infinite number of transfers permitted. The first 3 transfers cost 0.5 CAD each, transfers 4 and 5 cost 0.25 CAD each, and any following transfers are free. from_leg_group_id to_leg_group_id start_spanning_limit end_spanning_limit amount currency duration_limit duration_limit_type fare_transfer_type
bus subway 3 0.5 CAD 7200 0 0
bus subway 4 5 0.25 CAD 7200 0 1
bus subway 6 0 CAD 7200 0 2

This would make things clearer and more readable by defining exactly which rules apply to which transfers. It would also make defining Primary Keys easier since we can define a record by its from and to, the currency to be paid, and which specific transfers (ex: the 4th and 5th transfer).

Why would you forbid overlapping spanning_limits?

By forbidding an overlap of spanning_limit, we avoid the situation described in this comment so only one entry can represent the first 3 transfers. In the case of the example below, the last record would be forbidden since it overlaps/clashes with the first two rows in terms of spanning limits. from_leg_group_id to_leg_group_id start_spanning_limit end_spanning_limit amount currency duration_limit duration_limit_type fare_transfer_type
bus subway 3 0.5 CAD 7200 0 0
bus subway 4 5 0.25 CAD 7200 0 1
bus subway 6 0 CAD 7200 0 2
bus subway 1 4 0.5 CAD 7200 0 0

Also: why couldn't a producer have: from_leg_group_id, to_leg_group_id, duration_limit, amount a, b, 7200, $5 a, b, 3600, $3

Definitely a valid use case, never mind, scratch that! I've striked out that bit in my comment above.

flocsy commented 2 years ago

@omar-kabbani Now I see what you mean on "start" / "end". I think it's a good idea. However let's try to find a (actually 2) better naming convention(s). I think we'll have many other similar cases where we'll have these "step" type rules. For example:, number of areas, distances, number of legs (which is the same as spanning limit, just maybe less confusing). We only used the upper limits (what you called end*), because we always mean that the next "step" is from the previous step, with something like this: "the matched rule should be with the smallest value that is greater or equal to the actual value". This makes the start* not needed, and also then you don't need to forbid overlaps, because they are only possible if you repeat something but that is already forbidden I guess. However I agree with you that your example makes sense, because there could be a different meaning of these 2 sets of rules:

  1. without the start_spanning_limit: spanning_limit: 3, amount: 0.5 spanning_limit: 5, amount: 0.25
  2. with start/end: start_spanning_limit: , end_start_spanning_limit: 3, amount: 0.5 start_spanning_limit: 4, end_spanning_limit: 5, amount: 0.25

A 4 leg journey in case 1 would cost whatever the base price + 3 0.25, while in case 2 it would cost 2 0.50 + 1 * 0.25 (which BTW shows one more inconvenience: 0 and even 1 is not a logical value for spanning_limit, because that means there's no transfer at all)

I wonder what naming convention could we use for these "step" type fields. To me from*, to* would sound "natural" (for example from_leg_count, to_leg_count instead of start_spanning_limit, end_spanning_limit), but if others prefer some other word pair instead of from/to to so it's not confused with the from/to (for example in leg_group_id) that means previous leg/next leg, then I'm ok with that too.

Also while spanning_limit might sound logical when it's used just like that, IMHO if we do split it to from/to or start/end then it becomes a bit confusing, and in that case I would suggest something that goes along with the other possible similar "step" type fields: from_leg_count, to_leg_count, [from_distance], to_distance, [from_area_count], to_area_count

davidlewis-ito commented 2 years ago

With regard to primary keys for fare_leg_rules, I am a little confused as to the intent behind defining a subset of the fields as constituting the "primary key".

We see two issues: Firstly the the current working suggestion of network_id, from_area_id, to_area_id, currency fields misses out several fields that would be required to identity the appropriate fare rule(s) (timeframe, service_id, rider_category_id, contains_area-id..)

Secondly, we would point that for zonal fare schemes which will rely heavily on contains_area_id (or indeed future iterations of this mechanism) there would very likely be multiple matches within the fare_leg_rules for a single trip. Our assumptions was that the cheapest should be considered the correct fare. So in the context of this discussion how does Primary key apply to fare_leg_rules ?

flocsy commented 2 years ago
  1. @davidlewis-ito when we're talking about the "primary key" (or "unique key") of the txt files, then we don't say what the primary keys in your actual DB implementation should be, but rather we would have a clear constraint on what data is correct/incorrect from the producers' point of view.

  2. I agree with you that all the fields that are "input" (meaning we know it from the actual journey we want to calculate the fares for) probably should be in the primary key.

  3. Indeed, that's fine IMHO, and that is what our current implementation does: in case we have more than one matching fare we chose the cheapest (and much more than that, because we have multiple legs, and multiple matching rules for some legs....)

davidlewis-ito commented 2 years ago

@flocsy Thanks that is very helpful and I now understand that this is more about helping ensure the producers data is correct rather than an implementation guide (which seemed really strange.)

So the "primary key" is the set of "input" fields that should yield a single row. Isn't that the majority of of the fields in the fare_leg_rules table?

I am afraid that now I don't understand the original issue of this thread :

"All fields in fare_leg_rules.txt and fare_transfer_rules.txt were initially marked as Primary Keys, this is not optimal as it results in high computational load for lookups."

flocsy commented 2 years ago

@davidlewis-ito output fields like amount should not be part of the key, then you could have "endless" multiple rows where only the output is different

davidlewis-ito commented 2 years ago

@flocsy Would output fields comprise only: amount, min_amount and max_amount ?

flocsy commented 2 years ago

@davidlewis-ito in general: no, for example fare_transfer_type is also "output". As I look at it things that I know when I have a journey are my inputs to the fare calculation. I use those to look up rules, and the other fields (that I don't know) are the output. But this in just the general rule, sometimes there might be exceptions.

omar-kabbani commented 2 years ago

Hi @davidlewis-ito - thanks for the questions, I would just add to @flocsy's response

Firstly the the current working suggestion of network_id, from_area_id, to_area_id, currency fields misses out several fields that would be required to identity the appropriate fare rule(s) (timeframe, service_id, rider_category_id, contains_area-id..)

We are currently focused on the base implementation of GTFS-Fares v2, which covers the core components of the extension and not all of it. You can see what is in the base implementation here. For example, in the file fare_leg_rules.txt, the fields from_timeframe_id, to_timeframe_id, service_id, rider_category_id, and contains_area_id are not part of the base implementation.

Other files and fields of the GTFS-Fares v2 proposal can be added in potential future efforts to extend GTFS, and the Primary Keys will be revisited then.

omar-kabbani commented 2 years ago

Summary of roundtable discussions on Primary Keys

(22, 23, and 24 February 2022 11:00 AM ET)


Attended (partially or fully) by: @Cristhian-HA, @omar-kabbani, @flocsy, @jsteelz, @npaun, @philip-cline, @skinkie, @ritesh-warade-ibigroup, and @derhuerst

With the current fields in fare_leg_rules.txt and fare_transfer_rules.txt, the Primary Keys are:


*During the discussion, it was proposed to remove the amount and currency fields from fare_leg_rules.txt and fare_transfer_rules.txt and introduce fare_product_id to represent the information instead. This would also require adding the file fare_products.txt with the fields fare_product_id, amount, and currency to the base implementation.

This would provide more flexibility as assigning a price to a trip from A to B can go beyond an amount and currency. Instead, the price can be associated with a fare product such as a single ride fare, multiple ride fare, fare with transfers, zone-specific fare, etc. With this proposed change, the implementation would look something like this:

Consider a transit agency that offers three types for fares for riders travelling from zone A:

Based on conversations during the roundtable discussion, this is something that could not be represented in trip planning apps without introducing fare_products.txt. The ability to define fare products will facilitate recommending the best fare a rider needs based on their itinerary.

The fares are defined below

fare_products.txt fare_product_id amount currency
single_ride_A 2 CAD
single_ride_B 2.5 CAD
single_ride_A_incl_transfers 3 CAD
single_ride_B_incl_transfers 3.5 CAD
all_zones_incl_transfers 4 CAD
free_transfer 0 CAD

The travel rules for each fare are defined below

fare_leg_rules.txt leg_group_id from_area_id to_area_id network_id fare_product_id
within_A zoneA zoneA bus single_ride_A
within_B zoneB zoneB bus single_ride_B
A_incl_transfers zoneA zoneA bus single_ride_A_incl_transfers
B_incl_transfers zoneB zoneB bus single_ride_B_incl_transfers
all_zones zoneA zoneB bus all_zones_incl_transfers
all_zones zoneB zoneA bus all_zones_incl_transfers

The transfer rules are defined below

fare_transfer_rules.txt from_leg_group_id to_leg_group_id fare_product_id spanning_limit duration_limit duration_limit_type fare_transfer_type
A_incl_transfers A_incl_transfers free_transfer 0 7200 0 0
B_incl_transfers B_incl_transfers free_transfer 0 7200 0 0
all_zones all_zones free_transfer 0 7200 0 0

The definitions of fare_leg_rules.fare_product_id and fare_transfer_rules.fare_product_id need to be changed to reflect that the fields describe a price and not a filter. The use case of a fare product such as a day pass would require a new filter field that could be introduced in potential future efforts on GTFS-Fares v2.

Please let us know what you think of this change - and if all stakeholders are okay with introducing fare_product_id to convey pricing information, I will update the files in the base implementation to reflect this.


†During the roundtable discussion, it was proposed to split the field spanning_limit into start_spanning_limit and end_spanning_limit for more clarity as to when a transfer rule applies (example here). After the call, it was suggested by @flocsy to stick with spanning_limit and add wording that the spanning limit signifies the end of a certain transfer rule. Something along the lines of "The rule with the minimum spanning_limit that is greater than or equal to the span of the sub-journey is to be selected" as initially proposed by @npaun in this issue. Hence, in the example below, the first rule applies for the first 3 transfers, the second rule applies for the fourth and fifth transfers, and the third rule applies to all consecutive transfers.

fare_transfer_rules.txt

from_leg_group_id to_leg_group_id spanning_limit duration_limit duration_limit_type fare_transfer_type fare_product_id
A A 3 0 7200 0 fare1
A A 5 0 7200 0 fare2
A A 0 7200 0 fare3

Please let us know which option works best, splitting spanning_limit into two fields will add more clarity, but it would require another field. Sticking with spanning_limit and clarifying the definition would serve the same purpose.


Base fares were also discussed during the calls.

A use case for base fares is when a transit agency charges a rider a fare once they board, and then charges them an additional amount when they disembark based on their origin-destination and transfers. The base fare is generally valid for a set period of time before it resets.

Tying legs (and transfers) to a fare product is a step in the right direction to represent base fares since a leg (and any consecutive transfers) can be associated with more than just amount/currency. The fields in fare_products.txt are very versatile and may be extended further to include information specific to base fares. This however, primarily serves the use case of distance-based fares and would require additional work on fare_products.txt which is outside the scope of the base implementation. This sets a good foundation, however MobilityData recommends addressing base fares in detail when the need arises with distance-based fares as they seems to be the primary use case for base fares.

flocsy commented 2 years ago

I'd like to add a few thoughts about spanning_limit:

  1. if we split spanning_limit to 2 fields it adds also complication, because both the standard's wording and the validation will need to make sure that: 1.a. there are no overlaps 1.b. that there are no gaps (if rule1 would be: from_spanning_limit:1, to_spanning_limit:2 and rule2: from_spanning_limit: 4 - and suppose there are no more rules, then what should we do with a journey that has 3 legs?) Having only 1 field takes care of both - by adding it to the primary key.

  2. We need to clarify what does spanning_limit: 0 vs spanning_limit: '' (empty) mean

  3. I'd like to set a naming convention for different type of "step" type fields. Examples of step type fields are: duration_limit, spanning_limit, and probably in the future distance. However there are at least 2 different type of fields, and in the current proposal we already have both of them, so I think we should try to find a field name prefix/suffix that makes the difference clear:

duration_limit 3600 7200

If I read this correctly then what duration_limit means is that if my sub-journey is longer than an hour but shorter than 2 hours then the 2nd rule with 7200 would apply, and that means that this rule is used for all the legs/transfers of the sub-journey.

But for spanning_limit we probably want it to mean something else:

spanning_limit, amount 3, 0.25 5, 0.50 '', 1.00

these 3 rules could mean 2 things, and IMHO we clearly one the 2nd meaning:

option 1: similar meaning as I described for duration_limit, so if my sub-journey has 4 legs, then the 2nd rule will match, so all the transfers would cost 0.50

option 2: each transfer of the sub-journey is matched to the "best" rule (the one with the smallest spanning_limit that is greater or equal to the transfer number in the sub-journey). In this example when I have a 4 leg sub-journey, then the following prices would apply: transfer 1: 0.25, transfer 2: 0.25, transfer 3: 0.50

If you read the sentence about option 2, then you can see that there are 2 "issues" that I'd like to address by applying some naming convention:

issue 1: differentiate between step type rules that behave like duration_limit vs spanning_limit option 2. Any idea how to call these fields? For example X_limit vs to_X_count? I.e: duration_limit and transfer_count or leg_count.

issue 2: I find it a little bit confusing that the field is called spanning limit. I think it means: the sub-journey spans this many legs. But it's used in fare_transfer rules, and the "number of transfer" = "number of legs - 1". So maybe something like transfer_count or leg_count could be better?

omar-kabbani commented 2 years ago

Agreed - spanning_limit and duration_limit behave differently, and it would be good to have them named differently.

spanning_limit involves steps, or intervals, so a rule applies if the transfer falls in the current spanning_limit range. However, duration_limit is more of a universal field that defines the maximum timeframe for transfers between a pair of from_leg_group_id and to_leg_group_id.

Issue 1: I like the proposal of the terms duration_limit and transfer_count

Issue 2: Number of transfers would make more sense (transfer_count), currently, forbidding spanning_limit=1 is awkward transfers are not possible on 1 leg

If this is the direction we want to go forward with, I can update the files accordingly


We need to clarify what does spanning_limit: 0 vs spanning_limit: '' (empty) mean

Currently, the field definition for spanning_limit is: "0 or empty - No limit."

We can play it safe and change it to spanning_limit=blank meaning unlimited transfers and forbid spanning_limit=0. If there is a potential use for spanning_limit=0 in the future (maybe for base fares?)

flocsy commented 2 years ago

I like the transfer_count=blank => unlimited, =0 => forbidden.

davidlewis-ito commented 2 years ago

@omar-kabbani Regarding the primary fields of fare_leg_rules, as we have discussed before, the necessary includes fields are determined by what is and what is not in the base implementation. It would seem that service_id is necessary for even the most modest implementation and so I would assume that this would also form part of the primary key

Regarding the suggested refactoring to create a fare_product table, I think this is a very positive development providing the capability to align much more closely to the real world. I would suggest there may be a case for a few more fields - for instance providing a passenger recognised label for the product (rather than relying on an implied label within the fare_product_id).

omar-kabbani commented 2 years ago

Hi @davidlewis-ito

The field service_id in fare_transfer_rules.txt would allow transit agencies to describe fares that are specific to dates/times - for example, something like rush hour pricing. I agree this is a very valid use case, but we have excluded it from the base implementation as we could not find data producers who are using the field service_id as you can see in this table. I have tagged the primary data producers for information @e-lo @drewda @jewel1965

Once more fields are added (potentially in the future), the Primary Keys will have to be revisited and modified accordingly

With regards to your suggestion for fare_products.txt, we can add the field fare_product_name with the following definition "The name of the fare product as displayed to riders."

This was brought up in one of the roundtable calls, and I think it should be an easy field to introduce.

irees commented 2 years ago

My thoughts on removing amount and currency from leg/transfer rules:

Many agencies, such as BART here in California, have tickets that are based on (from stop, to stop) without any underlying logic that can be abstracted out (e.g. distance, number of stops, fare zones, etc.). Removing amount from fare_leg_rules.txt/fare_transfer_rules.txt and requiring a fare_products.txt entry for every possible fare amount will require additional maintenance work (two tables that have to be kept very closely in sync and occasionally splitting and merging fare products to match changes in the fare table), or creating these fare products programmatically.

flocsy commented 2 years ago

@irees, programatically seems to be the right choice. I mean the data is already stored somewhere in some format. I doubt it is organised manually, so it's just a matter of output. If I was writing the code I would "unite" all those from_stop, to_stop pairs that have the same price to use the same product_id (it generates much less data I guess), but even if someone would do the lazy way, that each rule has it's own product_id can be done.

Just out of curiosity, what is the ratio of the number of (from_stop,to_stop) pairs vs distinct(amount) ?

flocsy commented 2 years ago

Regarding the fare_products.fare_product_name: Yes, it can be an optional field. If we decide to add it then we should maybe also discuss possible translations for it.

omar-kabbani commented 2 years ago

With regards to switching from amount/currency to fare_product_id, we prepared a brief list pros and cons to help with the conversation.

We are happy to proceed with whatever option the community agrees on - however, we see that switching to fare_product_id brings more flexibility - and benefits the specification on the long term.

Pros Cons
Makes fares less abstract by tying them to a product (like a physical or virtual ticket) Requires reworking current GTFS-Fares v2 datasets
Provides a framework for trip planners to display more details about the fare (ex: picture of the ticket) Requires maintaining two tables instead of one
Enables better visualization of fare options; one example would be travelling from A to B (using either a cheaper one way ticket or a more expensive ticket that allows transfers) Requires revisiting how fare_products.txt will be used to represent transit passes
Sets a good foundation for representing complex fares in the future (potentially base fares and distance-based fares)  
Good data management (for example, changing the price of legs only requires changing the price of the tickets instead of changing the cost of all the legs)  

The switch from amount/currency to fare_product_id can be interpreted as saying that a trip from A to B requires a ticket that costs 3 CAD (instead of saying that a trip from A to B costs 3 CAD)

omar-kabbani commented 2 years ago

On the topic of replacing amount/currency with fare_products, MobilityData met with Interline/MTC yesterday (thank you @irees @drewda @jewel1965 for taking the time to meet with us).

Interline/MTC proposed to keep both options in the specification. Data producers who want to implement fare_products to represent fares can do so, and data producers who want to associate fare_leg_rules to a cost (amount/currency) can continue to do so. This would require making one set of fields conditionally forbidden (forbid fare_products if amount/currency are used and vice versa).

@flocsy @npaun @jsteelz I am tagging you to hear your thoughts as you were the main advocates for making the switch for fare_products. Others please chime in with your thoughts as well.

flocsy commented 2 years ago

It makes sense, though it complicates fare calculation implementations a little bit.

npaun commented 2 years ago

This suggestion would be the opposite of what we agreed on at the last meeting Transit participated in. Unfortunately, we think having both options in the specification complicates fare calculation too much.

As we understand the proposed wording ("forbid fare_products if amount/currency are used and vice versa"), this would create a situation where the file fare_products.txt is conditionally forbidden. A feed would not be able to use both amount, currency and fare_product_id. This would mean that any feed that used the fare_leg_rules.amount method could never also include monthly passes.

We are strong proponents of fare_products because in the end-user perspective, pricing is heavily tied to the product purchased by a rider. For the same journey, riders may have the choice between many products with different prices, as well as different conditions, such as transfer rules. As a data consumer, the transcription of riders’ needs is better modelled by placing (amount,currency) in fare_products.txt.

Different forms of complexity

While adding fare_products for each pair of stops in the BART fare structure will add some complexity to that particular feed, it has not been shown that this outweighs the complexity of having two different means of specifying prices. The latter complexity is borne by all producers and consumers, even if a feed doesn't require fare_leg_rules.amount. BART’s fare structure is complex, regardless of the approach chosen.

So far we've only discussed fare_leg_rules, but we're most concerned about the effect this proposal will have on fare_transfer_rules. The full form of this file would have a filtering fare_product_id, a fare_product_id used to specify the cost of the transfer plus an amount, currency pair which could also specify the cost. This file is already very complex and is creating a lot of confusion among stakeholders.

fare_products.txt could simplify BART fares in the long run

When rider_category_id is added to the spec, its use will be needed to fully specify the BART fare structure. Consider this example for a trip from Embarcadero Station to Daly City Station:

leg_group_id,from_area_id,to_area_id,rider_category_id amount,currency
embr-daly,Embarcadero,Daly-City,adult,3.50,USD
embr-daly,Embarcadero,Daly-City,senior-disabled,1.30,USD
embr-daly,Embarcadero,Daly-City,youth,1.75,USD

If fare products are used, we can avoid repeating many fields in fare_leg_rules.txt:

leg_group_id,from_area_id,to_area_id,fare_product_id
embr-daly,Embarcadero,Daly-City,price-embr-daly

And provide the following fare_products.txt:

fare_product_id,rider_category_id,amount,currency
price-embr-daly,adult,3.50,USD
price-embr-daly,senior-disabled,1.30,USD
price-embr-daly,youth,1.75,USD

In fare systems that have multiple containers, weekend discounts, and timeframe-based fares, fare_products.txt only grows more advantageous, as it lets us avoid specifying the 'selector fields' for each distinct price. The normalization that fare_products provides is more complex at first, but it pays for itself as the specification and agency feeds evolve in the long run.

omar-kabbani commented 2 years ago

Thanks @npaun for the feedback

For what it's worth, if we proceed with both options, we will not forbid the entire fare_products.txt file if amount/currency are used. Instead, it is forbidden to use fare_product_id in the same record with amount/currency.

I am not sure if this is enough to sway your opinion towards keeping both options.

Either way, we see the overall benefit of switching to fare_products, and we also see the benefit of keeping things simple with amount/currency for cases like BART. I will make the changes to the files to only include fare_product_id today since there seems to be a general agreement on that. I will wait a bit longer to see if anyone else wants to chime in on keeping both options and then we can see if we need to make further changes.

drewda commented 2 years ago

Ahead of our call between Transit, MTC, Interline, and MobilityData about the amount/currency fields, let me try to summarize our concerns to-date and provide a specific example to look at:

  1. Moving amount/currency to fare_products will require many more fare products for the three rail agencies that use station-to-station fares: BART, ACE, Capitol Corridor. Not a major technical concern, as we would be able to produce this programmatically. The resulting list of fare products would be long and miscellaneous, but there is no need to discuss this further.

  2. For transfer discounts, we need to be able to specify negative amounts. We see the current PR does allow negative values on fare_product.amount, so there is no need to discuss this further.

  3. We estimate that moving amount/currency to fare_leg_rules and fare_transfer_rules to fare_products could be done in an automated post-processing way for ~85% of our existing records. Just as with the station-to-station fares, it raises questions for Interline and MTC of how to best maintain this data in our tooling, although there is no need to discuss that further.

  4. We are concerned about how we would represent ~15% of the current fare_transfer_rules. For example, a holder of a Caltrain monthly pass can receive a discounted fare on an AC Transit transbay bus after they have transferred from a Caltrain leg. (The discounted fare is only available to riders who have used their contactless Clipper Card to tag off Caltrain within the previous 2 hours; the discounted far is dependant upon it being at least the second leg of the overall journey.)

At present, we model these situations with fare_transfer_rules that reference a fare_product_id for the pass on the 1st leg's agency and include an amount for discounted fare for the 2nd leg's agency. (Re-reading Omar's note earlier in this thread, we see that we were actually out of sync when we discussed this. A compromise that allows amount/currency but not when fare_product_id is present would not be a solution for these situations in the Bay Area feed.)

We've prepared a spreadsheet with an example of the Caltrain-to-AC-Transbay example under both Fares-v2 in the Google Doc (blue) and our understanding of Transit app's preferred approach (green).

Also note that the green hypothetical example assumes that in fare_products.txt the fare_product_id column does not require uniqueness and can be overloaded. (This is how it works in the Fares-v2 Google Doc; but not in the current PR.)

On our call, let's confirm that this squares with everyone's understanding of the two versions of the proposal.

  1. Consumers of the new fare_products.txt file will need to make use of other files in order to identify which fare products are "stand alone" and worth displaying to riders (like a Caltrain monthly pass) and which are for internal-use only (like the "CT-to-AC-multizone-transfer" fare product). Just like the issue of generating many fare products for station-to-station agencies, this isn't programmatically hard and we have no strong feelings about it, but it's worth noting that after the proposed change the Bay Area fare_products file will not be useful on its own for a user-facing app.

  2. Assuming the above is accurate, to re-model those ~15% of transfer discounts will be technically possible but take more effort from MTC and Interline staff. This would be in addition to permanently reworking the records that can be temporarily post-processed. Also, outside of this discussion but still a factor in our overall concerns is that we are also planning to make changes to address the removal of is_symmetrical and the addition of areas.txt/stop_areas.txt. We have invested a lot of effort into both tooling and the existing data, so consideration of the level of effort to change the data to fit a new scheme is a factor in our considerations of how to vote on this PR.

No need to reply in advance on the above -- just want to share this to separate out the distinct topics to make for a more productive call. We'll be curious to hear more from the Transit team about tradeoffs within your own systems, and what you've hoped to gain by moving amount/currency fields and removing the current capabilities to specify both fare_transfer_rule amounts and fare_products.

Finally, both Interline and MTC are fine voting for a base PR that doesn't include the full set of functionality we use in the Bay Area Regional GTFS Feed. We voted for the previous PR with the understanding that we'd implement the superset of functionality in the Google Doc and let the adopted subsets in PRs catch up over time. We probably can't defer the question of amount/currency, but we are open to ideas of how to make this PR easier for everyone to vote for.

flocsy commented 2 years ago

@drewda

  1. do you have a gtfs feed where you have both Caltrain and AC Transit? If no, then this isn't a relevant issue anyway. If you do then, well this is indeed something not yet supported, but the addition of fare_products.txt is indeed the 1st step to make this possible in the future. This is also not a specific problem, in many big cities the same dileamma applies: Which fares should the gtfs show? The fares that someone with only cash can buy? The fares that a monthly card holder will pay? And then we still didn't talk about students, adults, children, pensioners... For NOW you'll have to chose what makes most sense to you (BTW is there a field in the GTFS where a textual explanation/"disclosure" could be added for these cases?) and later I hope we'll be able to add the possibility to represent these things better (maybe even fully) in GTFS v...

Regarding the use of fare_transfer_rules.fare_product_id I strongly discourage you to to do what you described! I even asked Omar previously to make this very clear in the field definition, that fare_product_id is the OUTPUT which means it is the fare that should be paid when this rule matches (and is chosen). Omar please make sure this is clear in the description!

We are aware that for filtering purposes we will probably need an additional field that does reference fare_products.fare_product_id. This additional field will be a FILTERING field that will be possible to use for things like: this rule can only be used if the user has the specified fare_product (that can be a monthly pass or a ticket purchased on a previous leg)

Regarding repeating fare_prodcuts.fare_product_id: I don't understand what you mean. If the same id could be in more than one line then how would anyone know which line you mean?

  1. I think you have a very good point there. I didn't know these are displayed already. I think that we can add a new field to indicate this, however since the name is optional you can probably leave it empty and the consumers will be able to filter only those that have a name.

I can share my thought about amount/currency: In most of the thousands of agencies I know the fares are still tied to some physical ticket. I'm pretty sure this is the wast majority world-wide and will remain like that for years, even if more and more agencies change their system to be distance based (or something else). This was the main concern. Imagine an agency with thousand rules that all have either $3 or $5 in their rules. By moving it to products they'l be able to reference only 2 different products (that pretty well corresponds to the real world). I understand there are other cases, we also deal with many agencies that have some distance or stop-to-stop based fare system, but in my experience even many of them will "benefit" from moving the amount to products, because in many gtfs files I know the amounts would appear multiple times because of how the rules are built, so even then the same kind of applies. Anyway for these simple use cases we did keep amount/curreny and only made them conditionally forbidden because currently we thought there there is no real reason why we would enable a producer to use both rule.amount and rule.fare_product_id in the same rule line. If you have specific cases why would this be useful then I would suggest to start a discussion about it: either it'll convince us to remove the conditionally forbidden (though now that I think about this seemingly non-backward compatibility breaking change, it might be breaking implementations of consumers, but at least it'll continue to work for those feeds that don't start to use both of them) or we might come up together with better solutions.

skinkie commented 2 years ago

4. If you do then, well this is indeed something not yet supported, but the addition of fare_products.txt is indeed the 1st step to make this possible in the future.

If we already know that we would create unaggregatable feeds. Shouldn't ITO reply here too then?

npaun commented 2 years ago

@drewda

Re: (4) Transfers between Caltrain and AC Transit

This situation is out of scope for the current PR (#286), but we think it should be addressed in a future PR. Currently, even the full Fares v2 draft can't correctly represent this case. Last October, we proposed adding a new field called transfer_only to fare_leg_rules.txt. I've updated the proposal to reflect the current state of Fares v2:

https://docs.google.com/document/d/18yWhwR89pQp48VuBNPXK0djLXOtdjWhgC2h6jCC8WPM/edit?usp=sharing

Basically, a fare transfer rule from Caltrain to AC Transit could be paired with an AC Transit fare leg rule with transfer_only=1. In this case, that fare leg rule would only apply if a valid transfer was used, and could not be used for the first leg of a journey.

This proposal would also allow producers to clarify what happens on the 2nd, or later, transfers of the journey, which is currently a serious point of confusion. For instance, for a trip like Caltrain -> AC Transit -> AC Transit, is the second transfer free, or must you pay again?

We think the transfer_only field is highly expressive, while being straightforward to implement. Formally, I believe it would allow you to express deterministic finite automata.

Re: Multiple products with the same fare_product_id

I think this situation is also out of scope for the current PR. The full draft of Fares v2 permits multiple products to share the same ID (e.g. monthly_pass) in order to provide different prices for the same product, to persons in different rider categories (e.g. Students/Seniors/Adults), with different fare containers (e.g. Clipper/none), or possibly paying in different currencies (e.g. USD/CAD).

It doesn't permit the use of a single ID (e.g. pass) to represent several different actual products like a daily pass, weekly pass and monthly pass all at the same time. When the time comes to add rider categories and fare containers to the spec, we would support keeping this rule as is.

Re: The benefit of using fare products

I think @flocsy expressed this point very well. I'd be happy to give more examples on our call later today.

drewda commented 2 years ago

Hi @flocsy, re:

4. do you have a gtfs feed where you have both Caltrain and AC Transit?

Yes, I am referring to the SF Bay Area Regional GTFS Feed, which MTC and Interline created against the full Fares-v2 proposal in the Google Doc . Overview and info on how to access at https://www.interline.io/blog/mtc-regional-gtfs-feed-additions/

davidlewis-ito commented 2 years ago

Hi @npaun

I'm trying to understand a little bit more about your proposed transfer_only and filter_fare_product_id described in your doc https://docs.google.com/document/d/18yWhwR89pQp48VuBNPXK0djLXOtdjWhgC2h6jCC8WPM/edit?usp=sharing

I may well be missing some of the complexity but I don't really understand what function these provide.

In your document's example the combination of from_leg_group_id and to_leg_group_id already selects the appropriate entry in fare_transfer_rules. And the fare_transfer_type = 0 already flags that the fare of the second fare leg should not be applied so why the need for transfer_only.

Here's my adjusted version of your data for illustration:

fare_leg_rules.txt leg_group_id,network_id,fare_product_id,transfer_only goatville:local,goatville:local,goatville:local-1trip,0 goatville:express,goatville:express,goatville:express-1trip,0 marmottown:local,marmottown:local,marmottown:local-1trip,0 marmottown:local,marmottown:local,goatville:express-1trip,1

fare_transfer_rules.txt from_leg_group_id,to_leg_group_id,filter_fare_product_id,fare_transfer_type,fare_product_id,duration,spanning_limit goatville:local,goatville:local,goatville:local-1trip,0,,3600, marmottown:local,marmottown:local,marmottown:local-1trip,0,,7200, goatville:express,marmottown:local,goatville:express-1trip,0,goat-marmot-transfer,3600, marmottown:local,marmottown:local,goatville:express-1trip,0,,3600,2

fare_products.txt fare_product_id,amount,currency goatville:local-1trip,2.50,CAD goatville:express-1trip,3.00,CAD marmottown:local-1trip,3.00,CAD goat-marmot-transfer,0.25,CAD

Thanks (and apologies if this is no longer the best place to post on this subject!) David

flocsy commented 2 years ago

Regarding the filter_fare_product_id (disregarding possible name improvements for now): maybe this field should be added to fare_leg_rules.txt, and that will do the trick for you (and many other tricks for others): having it null means you can use this rule for the 1st leg as well, and having it non-null means the user has to own the product (that could be a still valid transfer ticket from a previous leg, as you intend IMHO, but it could also be used for a monthly pass or something of that kind)

This would make transfer_only not needed IMHO and give more flexibility for other use cases.

omar-kabbani commented 2 years ago

I would hold off on diving into filter_fare_product_id and transfer_only in order to sort out the Primary Keys first.

Now back to the definition of a Primary Key in the specification (link)

The primary key of a dataset is the field or combination of fields that uniquely identify a row.

In other words, if you know the Primary Keys, you can define one unique row - the other fields are not necessary to define a unique row.

For fare_leg_rules.txt, I propose the following set of Primary Keys:

The justification is that, currently, in the base implementation, you need the following to define a unique trip:

Regarding point#10 in @flocsy's comment in the pull request, I can see a case where we have the same records that only differ in price (ex: 1 USD vs 2 USD)

The remaining field is leg_group_id, and is not part of the set of Primary Keys since it is not a unique ID - also, with the remaining 4 fields, we can identify a row without the need to check leg_group_id.


For fare_transfer_rules.txt, I propose the following set of Primary Keys:

The justification is that, currently, in the base implementation, you need the following to define a unique transfer:

Regarding point#17 in @flocsy's comment in the pull request, I cannot think of a use case aside from different currencies.

I see the risk of bad data, ex: a dataset with the same records except one costing 1 USD and another costing 2 USD. However, I would still suggest we keep fare_product_id in the set of Primary Keys for the following reasons:

The other two fields (duration_limit_type and fare_transfer_type) are dependant on the transfer itself, so they do not provide any value in identifying unique records. So they are not considered part of the set of Primary Keys.

flocsy commented 2 years ago

The primary key of a dataset is the field or combination of fields that uniquely identify a row.

In other words, if you know the Primary Keys, you can define one unique row - the other fields are not necessary to define a unique row.

For fare_leg_rules.txt, I propose the following set of Primary Keys:

  • from_area_id
  • to_area_id
  • network_id
  • fare_product_id

The remaining field is leg_group_id, and is not part of the set of Primary Keys since it is not a unique ID - also, with the remaining 4 fields, we can identify a row without the need to check leg_group_id.

  1. IMHO if we don't add leg_group_id to the primary key then producers will not be able to create 2 similar rules, where the (from_areaid, to_area_id, network_id) are identical, but leg_group_id is different. I am OK with that, and we can add it later to the primary key if needed. Maybe it would be good if a producer can comment on this.

  2. I think we shouldn't add "output" fields to the primary key (again: fare_product_id is an output field, but I agree that it is a bit more complicated than the rest, because "half" of it (currency) can be thought of as an input key (in case multiple currencies are possible). Though because this is quite an edge case, maybe we should leave this out for now (we'll be able to adress it later), and consider the fare_product_id the "output" of the rule.)

Regarding point#10 in @flocsy's comment in the pull request, I can see a case where we have the same records that only differ in price (ex: 1 USD vs 2 USD)

  • The more expensive ticket allows for transfers
  • The more expensive ticket lets me ride the same bus as the cheaper ticket but I get a more comfortable experience (seats/service/etc.)

What you suggest is that a producer can accidentally or even purposefully add multiple lines to fare_leg_rules.txt, because in your reasoning the field that would be needed to be added to the primary key of fare_leg_rules.txt (i.e: transfer_count, ...) is not in this table but in fare_transfer_rules.txt.... I don't think this is good.

By adding leg_group_id to the primary key it looks that this can be solved, because then the producer can have 2 lines where the only difference in the key is in leg_group_id, and then can use the 2 different leg_group_id-s in the fare_transfer_rules.txt to do what you wrote above about the less/more transfers.

I am against "mixing" in irrelevant information to the already too complex fare system, so economy vs first class seats IMHO shouldn't be added. I mean of course it can be added as another fare_product, where (for now) the name can indicate "economy" vs "business" or whatever it's called, and there needs to be a better way to tell: the business ticket is a "better" version of the economy ticket.

I personally don't like to be forced to repeat the rule lines in order to be able to define more products that entitle a user to use the same "service" (note: I use service as a bus line, not whether or not you get a "free" drink). This doesn't sound a good way (for a producer). What we have is fare_hierarchy where we can define relations between different fare products like: fare_product_2 CanUse fare_product_1 (means: if you have fare_product_2 then you can do everything that is possible with fare_product_1), or fare_product_1 Upgade fare_product_2 (means: if you bought a $3 ticket on the local bus on the 1st leg, and on the 2nd leg you need a $5 ticket for the express bus, you can upgrade your ticket by only paying the price difference, and from that leg you do possess fare_product_2) This enables us to have the minimum number of rules, always using the "cheapest" fare_product. Similarily CanUse relation eables multiple currencies. Or if a user had a weekly/monthly pass then it enables him to use any of the lines... We did not speak enough about how we want products to be used.... Maybe we'll need either a less "clever" fare_product_group_id or something more complex that better represents the real-world hierarchy between different fare products.

Regarding point#17 in @flocsy's comment in the pull request, I cannot think of a use case aside from different currencies.

Well it doesn't really matter, IMHO whatever we decide regarding fare_leg_rules.fare_product_id should also go for fare_transfer_rules.fare_product_id anyway.

I see the risk of bad data, ex: a dataset with the same records except one costing 1 USD and another costing 2 USD. However, I would still suggest we keep fare_product_id in the set of Primary Keys for the following reasons:

  • I would prioritize the use case of enabling different currencies over agencies creating bad data

IMHO we can do it in next PR.

  • The GTFS validator should be able to pick up logical errors like this

Not easy, if it's allowed according to the proposal.

  • We can add word wording to ensure data producers do not make this error

Sounds to me hard, especially that we know that more fields will be added to both tables.

npaun commented 2 years ago

@davidlewis-ito

My original example and your adjusted version describe two different transit systems.

  1. Often you can use express tickets on local busses too. If the goatville:express-1trip product were added to the goatville:local leg group, then leg groups would not longer be enough to decide which riders are entitled to the interagency transfer or not. This situation would necessitate the use of filter_fare_product_id as in my original example.
  2. In my original example, a rider using a Goatville express ticket can transfer to a Marmottown bus for $0.25, then can transfer to a second Marmottown bus for free. Further transfers are not permitted. In your adjusted example, that second free transfer is not possible.

@omar-kabbani

We agree with your suggestion to add wording prohibiting transfer rules that are identical except as to the amount.


@flocsy

  1. We also don't have strong opinions on the inclusion of leg_group_id in the primary key, and feel it should be deferred to a later PR if a use case is found to motivate it.
  2. If fare_product_id is not included in the primary key, then it would mean that only one type of ticket could ever be used for a certain class of journey (say from the airport to downtown). This is clearly undesirable.
  3. I think upgradeable fare products are a great idea for a future PR, and we have something similar implemented in our system based on an extra column in fare_products.txt. We would leave this out of the current PR as they can be added in a backwards compatible way.
flocsy commented 2 years ago

@npaun

  1. This is a bit tricky because if you add leg_group_id now to the primary key then removing it later is a backwards compatible change for producers, but not for consumers (probably depends on the implementation). And not including it now, but adding it later would be a non backwards compatible change for producers and backwards compatible for consumers. However I would add it now because that's more restrictive and only "open" it in the future by removing it from the primary key if there's a particular need for it.

  2. I agree this probably shouldn't be included in this PR, but it does have a huge effect on #2, so maybe we'll need to start to talk about it sooner than later.

  3. I would prefer not to see (or even to be able to produce) lines in fare_*_rules.txt where the only difference is in fare_product_id.

omar-kabbani commented 2 years ago

Summary of roundtable discussions on Primary Keys

(26 April 2022 at 11 AM ET)

Attended by @omar-kabbani, @michellenguyenta, @jsteelz, @npaun, @timMillet, @irees, @flocsy


During the discussion, it was agreed that with the current fields in fare_leg_rules.txt and fare_transfer_rules.txt, the Primary Keys are:

fare_leg_rules.txt:

fare_transfer_rules.txt: