martj42 / international_results

https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017
Creative Commons Zero v1.0 Universal
133 stars 29 forks source link

Created script to assign IDs for results, goal_scorers and shootouts #25

Closed MattMaginniss closed 1 year ago

MattMaginniss commented 1 year ago

Hey Mart!

I wrote this script because I have been using your data for a little while for some fun messing around bits and found that matching up the results to the goalscorers and shootout results is always a little verbose.

So, I decided it might be useful to make a little script that would assign result_ids to the results data, goal_id to the goal scorers aand shootout_ids to the shootouts and then add the result ids to the associated goals and shootouts.

This has made it easier for me to relate the data and utilize it in other formats (database imports from the CSV)

The script right now would need to be manually re-ran prior to committing but if you think this is a nice addition (and wouldn't require you to manually manage IDs) please feel free to utilize it and if you don't want to bother manually re-running it, this could be assigned to a github action that would automatically run on pushes of new data to the main files and then regenerate and commit the compiled files with IDs.

I could help set that up if you are interested but have not done something like this before.

Also this could easily be added to the womens international results. So if you like this and see the value then I'll make a PR for that as well.

:) Have a good day and thanks for all the work you've done so far on this!

martj42 commented 1 year ago

For now, I'm leaning towards not having IDs. Maybe one day when the dataset is more complex but right now I feel like the IDs themselves would add complexity.

Maybe I'm biased by working in R where the joins are completely frictionless and an ID column wouldn't change anything.

But, yeah, in general, I like a clean csv and as little overhead as possible.