Closed fbarbe00 closed 6 days ago
My biggest confusion currently is how the UIC codes are actually defined and where they can be found - it seems to not be publicly accessible under MERITS?
Any help on transit data would greatly be appreciated (you can always reach out by email - you can find that on my website). I will make sure to contribute any findings of the next 6 months to open source projects like this one.
My biggest confusion currently is how the UIC codes are actually defined and where they can be found - it seems to not be publicly accessible under MERITS?
MERITS data is indeed not publicly available, the UICs present in this data set are a compilation of what can be found on internet or in carriers stations datasets
name
was changed from San Sebastián-Donostia
to Donostia-San Sebastián
slug
was changed from san-sebastian-donostia
to donostia-san-sebastian
uic
was added, value is 8019403
uic8_sncf
was added, value is 80194035
name
was changed from London St-Pancras
to London St. Pancras
name
was changed from Eindhoven
to Eindhoven Centraal
slug
was changed from eindhoven
to eindhoven-centraal
name
was changed from Delft Zuid
to Delft Campus
slug
was changed from delft-zuid
to delft-campus
I believe Neustadt UIC is not passing tests because it should be put on
7481
station. I rebased my branch to the latest version and pushed the change (hope it's correct this time), thank you!
I'm a little confused as to why some stations like Neustadt are in there twice, one being the "parent" station and the other one being a child station with the same name?
Also, throughout my research I will be building multiple scrapers to gather train data from multiple resources (GTFS feeds, Wikidata). I am more than happy if I can contribute to open source projects like this one, but I'm unsure as to how you feel about scrapers and possibly automated commits.
Feel free to send me an email (email address is on my website - fabiobarbero.eu) if you'd like to discuss this further.
I believe Neustadt UIC is not passing tests because it should be put on
7481
station. I rebased my branch to the latest version and pushed the change (hope it's correct this time), thank you!Thank you for your contribution, it is now merged.
I'm a little confused as to why some stations like Neustadt are in there twice, one being the "parent" station and the other one being a child station with the same name?
Some carriers may have multiple codes in their system for the same physical station.
E.g. regional train platform vs long-distance platform may have different codes. In some scenarios like that instead of having a parent and multiple child stations, we have only same as
stations.
Also, throughout my research I will be building multiple scrapers to gather train data from multiple resources (GTFS feeds, Wikidata). I am more than happy if I can contribute to open source projects like this one, but I'm unsure as to how you feel about scrapers and possibly automated commits.
Do not think we as the business will be comfortable with automated pull requests with significant changes on this data set. Both for the data ownership and possible licensing issues. However any "rename" or "adjust coordinates" types of contributions to track the real world changes, with the source information mentioned, will be more than welcome!
Note that most of the same_as
examples I came across where coming from a misunderstanding of the various data sources.
What?
I quickly went through the GitHub issues and fixed the easy ones. Each change is in a separate commit and I have always included the sources in the commit description.
Why?
I started writing my master thesis on optimising the European railway system, and I'm familiarising myself with the data at hand. I found this dataset and I though I'd contribute, to also get a better understanding of the different fields and external data sources.