jalapic / engsoccerdata

English and European soccer results 1871-2022
755 stars 192 forks source link

England Tier 1 2016/2017 & 2017/2018 data missing? #56

Closed Naxulanth closed 4 years ago

Naxulanth commented 5 years ago

Code I'm running:

print(england %>% 
  filter(Season %in% c(2016):c(2017),
         tier %in% c(1)) %>% 
  mutate(Date = as.Date(Date)))

Result:

# A tibble: 0 x 12
# ... with 12 variables: Date <date>, Season <dbl>, home <chr>, visitor <chr>,
#   FT <chr>, hgoal <dbl>, vgoal <dbl>, division <dbl>, tier <dbl>,
#   totgoal <dbl>, goaldif <dbl>, result <chr>

Edit: Problem seems to appear in tier 2 as well, predictably in all tiers of England?

jalapic commented 5 years ago

I haven't had time to update the 2017 and 2018 seasons yet - I really should. The 2016 data is there as you can see from this:

print(england %>% 
        filter(Season %in% c(2016),
               tier %in% c(1)) %>% 
        mutate(Date = as.Date(Date)))

This function gets you the 2018 data:

england_current(Season = 2018) and it should work for the 2017 Season too, but I just checked and it's messing up the dates (giving the wrong year as 2020).

Unfortunately, I haven't the time to fix all of this for a month or so - hopefully then I can get round to it. Sorry until then.

Naxulanth commented 5 years ago

Hello,

Running this code gives me an empty table

print(england %>% 
        filter(Season %in% c(2016),
               tier %in% c(1)) %>% 
        mutate(Date = as.Date(Date)))

Same with 2017, but 2018 works, which is why I thought something could be wrong and created the issue.

I've also seen commit messages mentioning that it was updated for 17/18, felt like there was an oversight in the code rather than data simply not being there 😄

Thanks for your time!

jalapic commented 5 years ago

The raw data has been updated to the 2018/2019 season - as I can see here - https://raw.githubusercontent.com/jalapic/engsoccerdata/master/data-raw/england.csv - I'm just working out why the .RData file (The one that supplies the england dataset to the package) doesn't seem to have that data updated. Hopefully will fix asap.

daviddalpiaz commented 5 years ago

The raw data seems to still be missing recent seasons, in particular 2016 and 2017. (2018 is present.) After reading in the data-raw/england.csv mentioned above, we can look at the recent seasons:

> tail(unique(england$Season))
[1] 2011 2012 2013 2014 2015 2018
RobWHickman commented 4 years ago

raw data fixed now. Ive just realised for some reason it's decided that a load of Premier League matches for 2017 took place in 2020, I'm hoping this is just some random openoffice error when putting data together manually and will try to fix tonight with a new PR