iRail / stations

A list of all the Belgian stations and their properties used within the iRail project
http://irail.be/stations/NMBS
29 stars 20 forks source link

Extract new data from GTFS, validate current data based on GTFS #132

Closed Bertware closed 6 years ago

Bertware commented 6 years ago

Reworked frequency_calculator to a script which can check and update stations.csv based on the latest GTFS data. Updated stations.csv using this script, resulting in 4 added stations and multiple updated stations.

This PR reworked the existing frequency_calculator script (which was used to calculate the avg_stop_times field). The script doesn't simply append data anymore, but loads the data from CSV (respecting the column headers), validates and updates it, and writes it back to CSV. This makes it less vulnerable for changes in the format.

The following features are implemented:

If this PR is reviewed and accepted, documentation will be added to the readme before merging.

Bertware commented 6 years ago

Previous remarks from @pietercolpaert in #127:

3 remarks:

  1. We should add a script in this repository which shows how the data from the GTFS is put in this file. Otherwise this field seems like random magic, without the context that this is official data from SNCB
  2. Is the field name the best one? E.g., isn’t minimum change time a better term?
  3. Should we also transform this field in the RDF transformer script so that it can be used in the Linked Data API as well?

I believe 1 and 2 are addressed (the field name can easily be changed if wanted) I agree with the third point, although this can be done in a future PR.