jthomasmock / espnscrapeR

Scrapes Or Collects NFL Data From ESPN
https://jthomasmock.github.io/espnscrapeR/
Other
51 stars 11 forks source link

College QBR Data missing #7

Closed christianlohr9 closed 3 years ago

christianlohr9 commented 3 years ago

Hi Tom,

I wanted to scrape some College QBR to analyse the 2021 prospects, so I was looking for QBR since 2017. Unfortunately the QBR via get_college_qbr() data seems to be missing for all years before 2020 (except week 1). I attached my code where I tested with week 1 and 2. I cross checked the ESPN website and it seems to be an issue with ESPN.

Any chance we can solve this problem?

Best regards, Christian


get_qbr <- function(weeks, years, silent = F){
  qbr_raw <- NULL
  try(qbr_raw <- espnscrapeR::get_college_qbr(season=years, week = weeks))
  if (is.null(qbr_raw)) return(tibble::tibble())
  qbr_raw
}

all_qbr <- purrr::pmap_dfr(purrr::transpose(
  purrr::cross2(1:2,2017:2020)), get_qbr)
#> Scraping QBR for week 1 of 2017!
#> Scraping QBR for week 2 of 2017!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2018!
#> Scraping QBR for week 2 of 2018!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2019!
#> Scraping QBR for week 2 of 2019!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2020!
#> Scraping QBR for week 2 of 2020!
jthomasmock commented 3 years ago

Interesting!

I just tested for season level data and that seems to be working out ok. I don't have local data stored for college (only NFL stuff) unfortunately. I hope that they fix the data source, but will take a peek at the endpoints to see if there's anything awry.

library(tidyverse)
library(espnscrapeR)

all_college <- 2010:2020 %>% 
  map_dfr(get_college_qbr)

all_college %>% 
  distinct(season)

# A tibble: 11 x 1
# season
# <int>
#  1   2010
#  2   2011
#  3   2012
#  4   2013
#  5   2014
#  6   2015
#  7   2016
#  8   2017
#  9   2018
#  10   2019
#  11   2020
mrcaseb commented 3 years ago

I don't have local data stored for college (only NFL stuff) unfortunately.

Sorry to barge into the discussion like this but I have a suggestion: How about setting up a scheduled GitHub action that loads and saves the data in a repo? I could help with the setup as I am doing this in multiple repos now.

mrcaseb commented 3 years ago

This won't fix the problem in this issue of course but could prevent future problems

jthomasmock commented 3 years ago

Yah I have the start of a data repo but have been lagging on getting it completed due to my own lack of time.

I have local copies of everything else as an intention to build the data repo, but seemed to have failed on the college QBR stuff.

jthomasmock commented 3 years ago

I just revamped everything to use httr and it solved a host of issues rather than relying on reading in the JSON raw.

@christianlohr9 - can you confirm this is working with latest release (0.5.1)? Also please note the new syntax get_college_qbr(season = 2020, type = "weekly"). Their API updated so it does paginated calls rather than a specific week of interest.

all_qbr <- purrr::pmap_dfr(data.frame(season = 2017:2020, type = "weekly"), espnscrapeR::get_college_qbr)
Scraping QBR for all weeks of 2017!
Scraping QBR for all weeks of 2018!
Scraping QBR for all weeks of 2019!
Scraping QBR for all weeks of 2020!

all_qbr
# A tibble: 6,064 x 35
   season  week week_text week_type player_id player_uid    player_guid     first_name
    <int> <int> <chr>     <chr>     <chr>     <chr>         <chr>           <chr>     
 1   2017     1 Week 1    Regular   3915776   s:20~l:23~a:… cb87cd32ec44b6… Kyle      
 2   2017     1 Week 1    Regular   550373    s:20~l:23~a:… 5bb4376ad089e3… Baker     
 3   2017     1 Week 1    Regular   4036210   s:20~l:23~a:… 72bd418b3cd541… Tyrrell   
 4   2017     1 Week 1    Regular   3116407   s:20~l:23~a:… 779f3cff73ebdc… Mason     
 5   2017     1 Week 1    Regular   3917810   s:20~l:23~a:… d3095f49b0478d… TaQuon    
 6   2017     1 Week 1    Regular   4240689   s:20~l:23~a:… b910f6e3c25865… Jake      
 7   2017     1 Week 1    Regular   3728240   s:20~l:23~a:… 778de00a350eea… Kelly     
 8   2017     1 Week 1    Regular   3916251   s:20~l:23~a:… 79ec907a5c1322… Zach      
 9   2017     1 Week 1    Regular   550448    s:20~l:23~a:… b1b5449071e0bd… Jesse     
10   2017     1 Week 1    Regular   3124092   s:20~l:23~a:… f3c9aea4bff630… John      
# … with 6,054 more rows, and 27 more variables: last_name <chr>, display_name <chr>,
#   short_name <chr>, headshot_href <chr>, team_name <chr>, team_short_name <chr>,
#   slug <chr>, team_id <chr>, team_uid <chr>, age <int>, game_id <chr>,
#   game_date <chr>, player_home_away <chr>, score <chr>, opp_team_id <chr>,
#   opp_team_name <chr>, opp_team_short_name <chr>, qbr_total <dbl>, pts_added <dbl>,
#   qb_plays <dbl>, epa_total <dbl>, pass <dbl>, run <dbl>, exp_sack <dbl>,
#   penalty <dbl>, qbr_raw <dbl>, sack <dbl>
jthomasmock commented 3 years ago

Closing this for now as I can confirm it's working again, but @christianlohr9 let me know if you run into any problems.