nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

Old Jaguars games still with plenty of bugs #474

Closed mrcaseb closed 1 month ago

mrcaseb commented 1 month ago

It is a known problem that some of the older Jaguars games have buggy raw json data where team names in playstats are always the opponent and never JAX or JAC.

We try to fix this with https://github.com/nflverse/nflfastR/blob/4b28a3e4a8ac36b52346da30d85db090cf9d9329/R/helper_scrape_nfl.R#L328-L377

However, there are two problems: 1.) This functions doesn't catch all *_team variables listed below

 [1] "home_team"                   "away_team"                   "posteam"                    
 [4] "defteam"                     "timeout_team"                "td_team"                    
 [7] "forced_fumble_player_1_team" "forced_fumble_player_2_team" "solo_tackle_1_team"         
[10] "solo_tackle_2_team"          "assist_tackle_1_team"        "assist_tackle_2_team"       
[13] "assist_tackle_3_team"        "assist_tackle_4_team"        "tackle_with_assist_1_team"  
[16] "tackle_with_assist_2_team"   "fumbled_1_team"              "fumbled_2_team"             
[19] "fumble_recovery_1_team"      "fumble_recovery_2_team"      "return_team"                
[22] "penalty_team"  

2.) The current code is buggy as well. For example this play where DAL punted, the JAX returner muffed and lost the fumble, and DAL recovered. It is all mixed up

  game_id         play_id desc                          posteam fumble_lost fumble_recovery_1_team fumbled_1_team
  <chr>             <dbl> <chr>                         <chr>         <dbl> <chr>                  <chr>         
1 2014_10_DAL_JAX     654 (3:39) (Punt formation) 6-C.… DAL               1 JAX                    DAL           

These problems potentially affect the following 120 games (all of these games have in common that there is no "JAX" or "JAC" in any playstat).

  [1] "2001_01_PIT_JAX" "2001_02_TEN_JAX" "2001_03_CLE_JAX" "2001_06_BUF_JAX" "2001_09_CIN_JAX" "2001_11_BAL_JAX"
  [7] "2001_12_GB_JAX"  "2001_16_KC_JAX"  "2002_01_IND_JAX" "2002_04_NYJ_JAX" "2002_05_PHI_JAX" "2002_08_HOU_JAX"
 [13] "2002_10_WAS_JAX" "2002_13_PIT_JAX" "2002_14_CLE_JAX" "2002_16_TEN_JAX" "2003_02_BUF_JAX" "2003_05_SD_JAX" 
 [19] "2003_06_MIA_JAX" "2003_08_TEN_JAX" "2003_10_IND_JAX" "2003_13_TB_JAX"  "2003_14_HOU_JAX" "2003_16_NO_JAX" 
 [25] "2004_02_DEN_JAX" "2004_04_IND_JAX" "2004_06_KC_JAX"  "2004_10_DET_JAX" "2004_11_TEN_JAX" "2004_13_PIT_JAX"
 [31] "2004_14_CHI_JAX" "2004_16_HOU_JAX" "2005_01_SEA_JAX" "2005_04_DEN_JAX" "2005_05_CIN_JAX" "2005_09_HOU_JAX"
 [37] "2005_10_BAL_JAX" "2005_14_IND_JAX" "2005_15_SF_JAX"  "2005_17_TEN_JAX" "2006_01_DAL_JAX" "2006_02_PIT_JAX"
 [43] "2006_05_NYJ_JAX" "2006_09_TEN_JAX" "2006_10_HOU_JAX" "2006_11_NYG_JAX" "2006_14_IND_JAX" "2006_16_NE_JAX" 
 [49] "2007_01_TEN_JAX" "2007_02_ATL_JAX" "2007_06_HOU_JAX" "2007_07_IND_JAX" "2007_11_SD_JAX"  "2007_12_BUF_JAX"
 [55] "2007_14_CAR_JAX" "2007_16_OAK_JAX" "2008_02_BUF_JAX" "2008_04_HOU_JAX" "2008_05_PIT_JAX" "2008_08_CLE_JAX"
 [61] "2008_11_TEN_JAX" "2008_12_MIN_JAX" "2008_15_GB_JAX"  "2008_16_IND_JAX" "2009_02_ARI_JAX" "2009_04_TEN_JAX"
 [67] "2009_06_STL_JAX" "2009_09_KC_JAX"  "2009_11_BUF_JAX" "2009_13_HOU_JAX" "2009_14_MIA_JAX" "2009_15_IND_JAX"
 [73] "2010_01_DEN_JAX" "2010_03_PHI_JAX" "2010_04_IND_JAX" "2010_06_TEN_JAX" "2010_10_HOU_JAX" "2010_11_CLE_JAX"
 [79] "2010_14_OAK_JAX" "2010_16_WAS_JAX" "2011_01_TEN_JAX" "2011_04_NO_JAX"  "2011_05_CIN_JAX" "2011_07_BAL_JAX"
 [85] "2011_12_HOU_JAX" "2011_13_SD_JAX"  "2011_14_TB_JAX"  "2011_17_IND_JAX" "2012_02_HOU_JAX" "2012_04_CIN_JAX"
 [91] "2012_05_CHI_JAX" "2012_09_DET_JAX" "2012_10_IND_JAX" "2012_12_TEN_JAX" "2012_14_NYJ_JAX" "2012_16_NE_JAX" 
 [97] "2013_01_KC_JAX"  "2013_04_IND_JAX" "2013_07_SD_JAX"  "2013_08_SF_JAX"  "2013_11_ARI_JAX" "2013_14_HOU_JAX"
[103] "2013_15_BUF_JAX" "2013_16_TEN_JAX" "2014_03_IND_JAX" "2014_05_PIT_JAX" "2014_07_CLE_JAX" "2014_08_MIA_JAX"
[109] "2014_10_DAL_JAX" "2014_13_NYG_JAX" "2014_14_HOU_JAX" "2014_16_TEN_JAX" "2015_01_CAR_JAX" "2015_02_MIA_JAX"
[115] "2015_06_HOU_JAX" "2015_07_BUF_JAX" "2015_11_TEN_JAX" "2015_12_SD_JAX"  "2015_14_IND_JAX" "2015_15_ATL_JAX"

We should try to update the raw json and see if the problems in playstats are fixed. If that's the case, we need to make nflfastR skip these games by adjusting the following if statement https://github.com/nflverse/nflfastR/blob/4b28a3e4a8ac36b52346da30d85db090cf9d9329/R/helper_scrape_nfl.R#L99-L102

guga31bb commented 1 month ago

We should try to update the raw json and see if the problems in playstats are fixed

I updated 2015_01_CAR_JAX if you want to test

andrewtek commented 1 month ago

Comparing the two JSON files:

Initial Check-in had team as CAR for all stats: image

Latest Check-in has JAX and CAR on their respective stats. image

This is encouraging!