trainline-eu / stations

List of stations and associated metadata
Open Data Commons Open Database License v1.0
104 stars 128 forks source link

Rename train stations, add uic from GitHub issues #1369

Closed fbarbe00 closed 6 days ago

fbarbe00 commented 1 month ago

What?

I quickly went through the GitHub issues and fixed the easy ones. Each change is in a separate commit and I have always included the sources in the commit description.

Why?

I started writing my master thesis on optimising the European railway system, and I'm familiarising myself with the data at hand. I found this dataset and I though I'd contribute, to also get a better understanding of the different fields and external data sources.

fbarbe00 commented 1 month ago

My biggest confusion currently is how the UIC codes are actually defined and where they can be found - it seems to not be publicly accessible under MERITS?

Any help on transit data would greatly be appreciated (you can always reach out by email - you can find that on my website). I will make sure to contribute any findings of the next 6 months to open source projects like this one.

misdoro commented 1 week ago

My biggest confusion currently is how the UIC codes are actually defined and where they can be found - it seems to not be publicly accessible under MERITS?

MERITS data is indeed not publicly available, the UICs present in this data set are a compilation of what can be found on internet or in carriers stations datasets

github-actions[bot] commented 1 week ago

Stations changed

fbarbe00 commented 1 week ago

I believe Neustadt UIC is not passing tests because it should be put on 7481 station. I rebased my branch to the latest version and pushed the change (hope it's correct this time), thank you!

I'm a little confused as to why some stations like Neustadt are in there twice, one being the "parent" station and the other one being a child station with the same name?

Also, throughout my research I will be building multiple scrapers to gather train data from multiple resources (GTFS feeds, Wikidata). I am more than happy if I can contribute to open source projects like this one, but I'm unsure as to how you feel about scrapers and possibly automated commits.

Feel free to send me an email (email address is on my website - fabiobarbero.eu) if you'd like to discuss this further.

misdoro commented 6 days ago

I believe Neustadt UIC is not passing tests because it should be put on 7481 station. I rebased my branch to the latest version and pushed the change (hope it's correct this time), thank you!

Thank you for your contribution, it is now merged.

I'm a little confused as to why some stations like Neustadt are in there twice, one being the "parent" station and the other one being a child station with the same name?

Some carriers may have multiple codes in their system for the same physical station. E.g. regional train platform vs long-distance platform may have different codes. In some scenarios like that instead of having a parent and multiple child stations, we have only same as stations.

Also, throughout my research I will be building multiple scrapers to gather train data from multiple resources (GTFS feeds, Wikidata). I am more than happy if I can contribute to open source projects like this one, but I'm unsure as to how you feel about scrapers and possibly automated commits.

Do not think we as the business will be comfortable with automated pull requests with significant changes on this data set. Both for the data ownership and possible licensing issues. However any "rename" or "adjust coordinates" types of contributions to track the real world changes, with the source information mentioned, will be more than welcome!

ConscritNeuneu commented 6 days ago

Note that most of the same_as examples I came across where coming from a misunderstanding of the various data sources.