transitland / transitland-datastore

Transitland v1 core components. Deprecated and only maintained occasionally. See Transitland v2.
https://transit.land/documentation/datastore/
MIT License
105 stars 18 forks source link

run external validator libraries on new feed versions #888

Closed drewda closed 7 years ago

drewda commented 7 years ago

When a new version of a feed is fetched from an agency server, we have the opportunity to run any number of validation libraries on the archive:

//cc @barbeau @antrim

barbeau commented 7 years ago

I've forked Stephen-Gates GTFS JSON Table Schema and started to generalize it for all GTFS consumers: https://github.com/CUTR-at-USF/GTFS/blob/full-spec/datapackage.json

His schema was specific to South East Queensland and had a number of required fields to ensure their data was populated, but these should be optional when matching to the current spec. He was also missing a number of fields, and a few tables still need to be filled in.

TODO for datapackage.json:

Open issue at https://github.com/CUTR-at-USF/GTFS/issues/1 - I'm happy to accept PRs there as well if others want to pitch in. I'll keep chipping away at it as I have time.

One open issue is how strictly we enforce constraints - for fields like stop.location_type, values 0, 1, and 2 are defined in the spec. We can use the schema to enforce only these values and fail on any other value, but that breaks extensibility (i.e., a consumer/producer agreeing on value 4 outside the spec). Extensibility has always been allowed/encouraged, so we need to decide if we constrain the schema to the officially defined spec or allow outside values.

I've added comments where I've noticed this so far.

barbeau commented 7 years ago

re: Conveyal gtfs-validator - we've gotten feedback from @sheldonabrown that he'd be willing to help get some of the changes by @laidig into the upstream Conveyal repo. If possible I'd like to do that to avoid forking the project. If not, I'd like to find a home for @laidig's changes in an organizational account where more than one person can be assigned to review/merge changes (we'd be willing to offer ours).

drewda commented 7 years ago

Learned from @mattwigway and @landonreed that instead of conveyal/gtfs-validator they now use conveyal/gtfs-lib to catch errors in GTFS feeds. When they're able to add a wrapper that can be called from the command line to produce validation output as JSON, then we'll set that up to run as part of Transitland's feed-fetch process.

landonreed commented 7 years ago

@drewda, the docs for conveyal/gtfs-lib have been updated with CLI usage instructions.

drewda commented 7 years ago

Both the Google Python FeedValidator and Conveyal gtfs-lib are now run automatically on production servers.

For example: https://transit.land/dispatcher/feed-versions/12ff6497fb2f6c5a9f568ec80c4be6ef928b0957

One bug to resolve on production #1069