google / transit

https://gtfs.org/
Apache License 2.0
581 stars 177 forks source link

versioning the static GTFS spec #215

Closed drewda closed 4 months ago

drewda commented 4 years ago

It's great to have a more regular cadence of additions to the static spec. However, this does make it more of a challenge for tool-makers to track which additions we support and for users to compare compatibility across different tools.

Currently, the best way to do this is to include a commit hash from this repo. However, that's not that easy for others to read.

It would be helpful if the static spec in the master branch had a version number, stored either in the Markdown file itself or as a release/tag on the repo.

Just a YYYY-MM-DD date of the last change merged into master would probably work fine.

scmcca commented 3 years ago

There is currently a "Revision History" kept in CHANGES.md and a date indicating the last revision at the top of reference.md. However, these have not been routinely updated as there is no obligation or agreement for any party to do so. I see a couple of options:

  1. MobilityData takes responsibility for updating the revision history anytime there is a major change (excluding things like minor editorial changes).
  2. Updating the revision history becomes an obligation under the specification amendment process.

Thoughts whether either of these address the versioning problem?

francispeixoto commented 3 years ago

What about capitalizing on the release system? to properly tag and release updates to the spec, like you would with code.

scmcca commented 3 years ago

We could also borrow from GBFS to have an easy and structured way to indicate the version within the documentation: https://github.com/NABSA/gbfs#specification-versioning

scmcca commented 3 years ago

I just opened #267 for an update to the Revision History, and included the date of last revision at the top of reference.md.

Is this a satisfactory versioning method?

skinkie commented 3 years ago

I just opened #267 for an update to the Revision History, and included the date of last revision at the top of reference.md.

Is this a satisfactory versioning method?

The problem with the version method is that it does not enforce usage. Hence stating to support version x.y.z does not mean anything. I wonder if a feature based version mechanism would be better.

e-lo commented 3 years ago

Is this a satisfactory versioning method?

In the opinion of Cal-ITP it is necessary, but not sufficient to use the revision history as a versioning method.

A few thoughts as to why a rigorous version/release system is critical:

  1. Breaking Changes I believe the "easy" reason why GTFS has not used a versioning system is because it doesn't (hardly ever?) add any breaking changes.

With the (probably likely) adoption of GTFS-FaresV2, we are going to see one of the first major bifurcations/non backward's compatibility actions with GTFS Schedule. Being able to refer to a version that is not a date (that GTFS version that used old fares circa 2020) vs something that you can call out as definitive and legible (i.e. v1.1.5 not an MD5 git hash) will enable a multitude of workflows (some discussed below) which will facilitate adoption and reduce overall friction.

  1. Validation Now we have a 'canonical validator' - but it doesn't mean much if it isn't referring to a specific version of the spec.

  2. Requirements writing Its a lot more specific and legible to say "please provide GTFS v 2.3.4 or later" than "please provide GTFS but with the up-to-date version of fares which was adopted in 2021".

  3. Legibility/Understandability If you want to look up what a specific version is, you shouldn't have to piece it together from a changelog. You should be able to click on a specific 'release' and do a diff to another 'release'

e-lo commented 3 years ago

w.r.t. implementation of a semantic versioning/release system, it seems like the approach should consider:

  1. Combined versioning of GTFS Schedule and GTFS Realtime Given that they are in a monorepo, intertwined (especially for extension adoption), and would end up requiring different version of each other if versioned separately (seemingly a pain)...it is probably easiest to manage the combined GTFS Schedule and Realtime spec as a single version.

question: is there anything funky with protobufs that would preclude this?

  1. Mirrored suggestions for versioning extensions

If a feed includes a documented extension, it should also have a version and a link to its documentation a la...

#feedinfo.txt

gtfs_version gtfs_version_url gtfs_ext_vehicles_version gtfs_ext_vehicles_url
1.1.4 http://github.com/google/transit@v1.1.4 0.6.5 http://github.com/e-lo/gtfs-ext-vehicles@v0.6.5
paulswartz commented 3 years ago

As a potential point of comparison, @mbta only versions the V3 API when making backwards-incompatible changes: https://www.mbta.com/developers/v3-api/versioning. For GTFS/GTFS-RT, most, if not all, changes are backwards compatible.

What concerns does a GTFS version help with, compared to looking at a feed to see if it uses a particular field or file?

antrim commented 3 years ago

Versioning would be useful in cases where practices among feed producers vary significantly and clarification is offered in a later version of the spec. Versioning would make it possible for a feed consumer to know if a dataset conforms to a later version. PR https://github.com/google/transit/pull/32 around blocks shows an example.

scmcca commented 3 years ago

I think the easiest approach (aside from dates), is to have combined versioning of GTFS Schedule and GTFS Realtime as @e-lo pointed out. Both would be captured under a broader "GTFS Version".

Say there is a major breaking change (X) that was applied for GTFS Realtime, but not for GTFS Schedule, consumers could require "GTFS Schedule (v1.0.0) or later" and "GTFS Realtime (v2.0.0)". I.e., they can be treated separately even though they make use of the same GTFS Versioning. This would avoid a burden on exclusively GTFS Schedule producers to update a meaningless version tag in their metadata.

I would be interested to hear if using a single versioning method for 2 specs would produce any confusion or conflict.

e-lo commented 3 years ago

combined versioning of GTFS Schedule and GTFS Realtime as @e-lo pointed out

I was actually suggesting that if they are going to continue to be managed in the same repository (something I don't think is necessary), they [ could ] by versioned together e.g. GTFS v1.1 Version increments could affect Realtime and/or Schedule. I don't know that this is the best solution, but it is a possible one.

barbeau commented 3 years ago

As a potential point of comparison, @mbta only versions the V3 API when making backwards-incompatible changes: https://www.mbta.com/developers/v3-api/versioning. For GTFS/GTFS-RT, most, if not all, changes are backwards compatible.

This is how GTFS-RT has been treated so far. We currently have a v1.0 tag in this repo, which is for the last commit that pre-dated the changes introduced for GTFS Realtime v2.0 (which introduced different semantics surrounding "Required" fields): https://github.com/google/transit/releases/tag/v1.0

We haven't tagged a GTFS Realtime v2.0 because it's still the current version (master branch), with all non-breaking changes being lumped under this major version. I also believe this tag dates back to before GTFS was also versioned in this repo, which is why it's not labeled more specifically as gtfs-realtime-v1.0 (but it could always be re-tagged as such if needed).

e-lo commented 3 years ago

Agree that breaking changes in a spec necessitate version control.

And...version control is incredibly useful for non-breaking changes in the spec as noted above and extended below.

Validation Now we have a 'canonical validator' - but it doesn't mean much if it isn't referring to a specific version of the spec.

Is the validator up-to-date with the spec? Which version?

a. gtfs v 1.2.3 as tagged in the repo b. Through may 30th (let me look through the commit log to see what the heck that means, google stack-overflow to see how to check out a revision by a date, etc)

Requirements writing Its a lot more specific and legible to say "please provide GTFS v 2.3.4 or later" than "please provide GTFS but with the up-to-date version of fares which was adopted in 2021".

In order to take advantage of <feature in a data pipeline, customer-facing information, trip planner feature> your transit agency must have implemented a. gtfs v 1.2.3 b. gtfs as approved through may 30th

Legibility/Understandability If you want to look up what a specific version is, you shouldn't have to piece it together from a changelog. You should be able to click on a specific 'release' and do a diff to another 'release'

a. click on a tagged release # and compare it to another b. search for commits by date and do a diff between them

The user experience at all levels is better with versioning.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

derhuerst commented 2 years ago

@github-actions still relevant!

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It may be closed manually after one month of inactivity. Thank you for your contributions.

isabelle-dr commented 3 months ago

This conversation has moved to #288