nflverse / nflverse-rosters

builds roster data for nflverse/nflverse-data
Other
20 stars 4 forks source link

Mismatched field format fast_scraper_roster(2020) #4

Closed ajreinhard closed 4 years ago

ajreinhard commented 4 years ago

Maybe there is no easy solution for this without making the roster scraper function significantly more robust, but I've been have having issues with the 2020 format for some fields being different than prior years. The biggest issue is that the Rams abbreviation in the scrapper comes through as "LAR" rather than "LA" as it is treated across nflfastR. The other one that matters to me is birth_date, which is treated as MM/DD/YYYY pre-2020 and YYYY-MM-DD in 2020. height is also in different format for 2020 and status has a different set of values than prior years, but those two aren't as useful.

An example of all four below:

library(tidyverse)
library(nflfastR)

fast_scraper_roster(2018:2020) %>% 
  filter(last_name == 'Goff') %>% 
  select(season, full_name, team, status, birth_date, height)
mrcaseb commented 4 years ago

The "LAR" thing as well as the date format should be dumb mistakes by me that I should be able to fix.

Will check height as well.

The set of values in status depend on the data source which has changed in 2020 so this is unlikely to be fixed as we can't make the new source backwards compatible.

Thanks for noting this @ajreinhard!

mrcaseb commented 4 years ago

I have transferred this issue to the roster repo to be able to track it better

mrcaseb commented 4 years ago

I hope this won't break anybody's code but we will see if somebody complains.

The set of values in status won't get unified as the pre and post 2020 status data are too different

mrcaseb commented 4 years ago

whoops looks like I broke the birthdate completely. Will fix

mrcaseb commented 4 years ago

Fixed birth_date type problem with 2e0eaa5572e2a8384b0206034069b80fc1b1cdf4

library(dplyr)
library(nflfastR)
fast_scraper_roster(2018:2020) %>% 
  filter(last_name == 'Goff') %>% 
  select(season, full_name, team, status, birth_date, height)
#> # A tibble: 3 x 6
#>   season full_name  team  status birth_date height
#>    <dbl> <chr>      <chr> <chr>  <date>     <chr> 
#> 1   2018 Jared Goff LA    ACT    1994-10-14 6-4   
#> 2   2019 Jared Goff LA    ACT    1994-10-14 6-4   
#> 3   2020 Jared Goff LA    Active 1994-10-14 6-4

Created on 2020-11-05 by the reprex package (v0.3.0)