Open awanczowski opened 1 year ago
one example that we found is:
This came up as a suggested paper to read from a slightly odd service that sends me random suggestions. but this one might be useful: https://www.academia.edu/109690627/Soccer2014DS_a_dataset_containing_player_events_from_the_2014_World_Cup?email_work_card=title they reverse-engineered Opta data from Huffpost to create a data set of player events for the 2014 Soccer World Cup. The tool is available on github and so is the extracted data! Maybe we could convert it to sport schema…?
Other potential data sets that we could convert: https://footballcsv.github.io/ https://github.com/openfootball/ https://sports-statistics.com/sports-data/sports-data-sets-for-data-modeling-visualization-predictions-machine-learning/ https://github.com/openfootball/worldcup/ https://www.football-data.co.uk/ https://github.com/streampref/wcimport/tree/master/data (the data referenced in the academic article above) Several on Kaggle: https://www.kaggle.com/datasets/saife245/english-premier-league (data comes from https://football-data.co.uk/)
Thought from today: if we're doing something like a recent World Cup, we could just use wikidata IDs for all players. (We would need to check but all players should have wikidata IDs)
Looking at the datasets:
openfootball: has schedules in JSON format, and scores and goal-scorers in TXT version.
footballcsv: no world cup data as far as I can see. England (Premier League) data seems to end with 2020/21 season.
football-data.co.uk also doesn't seem to have world cup data. For Premier League, they have match-by-match breakdowns including team-level match-level stats (description of stats columns in the csv here)
None of the free/open data sets seem to have player lineups...?
These people https://www.sportmonks.com/football-api/ seem to give away Danish superliga and Scottish league data for free... could we use that?
Produce sample for a single sport, league, and team. The items samples should include Team Roster, Schedule, Standings/Positions, and Event. This will enable a coherent story across multiple samples.