jalapic / engsoccerdata

English and European soccer results 1871-2022
755 stars 192 forks source link

updating english data to 2018 and cleaning #52

Closed RobWHickman closed 5 years ago

RobWHickman commented 5 years ago

Mostly thought I'd make a pull request to see how active the package is.

Updates the english data to up to the end of last season and does some basic munging of the league cup data.

Though it's able to merge I think there's some work I'd want to do first (most obviously not have the data in /rob_data of course). I was thinking of standardising columns across the datasets (see pen/pens/hp/vp between the FA cup and league cup) mainly and then also updating the biggest european leagues but was wondering if there's anything pressing I might have missed. Also keen to get insights into what data sources are good to use. I pretty much only used 11v11.com which is ok for the range needed but limited.

I've left in my munging scripts. They're written in very basic and bad form R but I think it makes them more readable. Doing this mostly as experiments run so haven't gone back and commented etc.

Contemplated properly adding functions for scraping data but also think its a bit dangerous and would rather just archive the static data? Though see above questions about data sources if there are any with APIs etc. that could be utlised.

Best,

jalapic commented 5 years ago

thanks Rob for all of this. I've merged it. Any future work would also be greatly appreciated! I don't have as much time as I'd like to keep this up to date.

jalapic commented 5 years ago

Also w.r.t data sources. Most data sources right now seem to use the same sources, so things like 11v11 are fine. There were more problems when I was putting together historical data - but mostly should be fine now. The main issue to keep track of is keeping team names consistent across Seasons and competitions.

jalapic commented 5 years ago

thanks for this - ultimately I will have to delete the 'munge' and 'robdata' folder as I will need to get it into a format that is CRAN acceptable. However, for now it works ok - especially if people want up to date data.

RobWHickman commented 5 years ago

yep! no worries :)

I've got a fork where Im working on making the datasets into proper .rda files but have to do it in between work and other activities... I think I've got the league cup fully checked now. So just the FA cup and premier league

(also n.b. the league cup data on the pr has the FT listed as dates from when I had to do some manual editing in Excel but will fix that too)

best, rob

On Wed, Feb 20, 2019 at 9:51 PM James Curley notifications@github.com wrote:

thanks for this - ultimately I will have to delete the 'munge' and 'robdata' folder as I will need to get it into a format that is CRAN acceptable. However, for now it works ok - especially if people want up to date data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jalapic/engsoccerdata/pull/52#issuecomment-465769066, or mute the thread https://github.com/notifications/unsubscribe-auth/ARoJ-hN3evhX6oAFyWEjqQMYsCHRo5s1ks5vPcNRgaJpZM4bDhiH .