tlagore / footy

1 stars 0 forks source link

SPIKE: Dataset research #2

Open jfizzy opened 5 years ago

jfizzy commented 5 years ago

Determine applicable open source or other datasets that can be used to scrape relevant info from the web.

AC:

tlagore commented 5 years ago

We should probably also figure out what kind of data we are interested in. The more fine grained the data (would love to have stats per player per match, but this might be unrealistic).

https://openfootball.github.io/ has some pretty good historical summaries, not sure how up to date it keeps. It is also free so that's a plus.

https://rapidapi.com/sportsop/api/soccer-sports-open-data?utm_campaign=free-api-soccer-data&utm_medium=link&utm_source=quora has some juicy data and a nice API, but ranges from free to 200$. We can easily throttle our usage to make sure we stay in the free range.

JohnnySimmonds commented 5 years ago

That openfootball one seems to have good old data (at least who scored what goals and what teams won so very basic stats which might be a good place to start?) the things that I think will be hard to get are how the goals were scored (Head, penalty, etc..), assists, shots(in terms of old data). I think we should only be worried about a max of 5 years back for data as recent history is probably more important for predictions.

Scrapping from

https://www.soccerstats.com/trends.asp?league=england

Might be a good idea they seem to have good stats team wise.

JohnnySimmonds commented 5 years ago

https://www.fantasyfootballscout.co.uk/2019/02/15/fpl-blank-gameweek-27-an-analysis-of-the-key-underlying-stats/

Seem to do alot of the stats analysis (and maybe be abkr to get data off them costs 10 pounds atm for the current season 15 pounds normally)

jfizzy commented 5 years ago

https://fbref.com/ is the best in my experience. They have a huge amount of historical data with different applications and it should be pretty straightforwards to scrape (download files then parse them.). We may want something with more of an api for pulling data though.

I have used it for a data science course and it got me everything and more than what I needed to aggregate a large dataset.

Also its free/opensource data