nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
425 stars 52 forks source link

Safety Points go to DAL not JAX in 2014_10_DAL_JAX #209

Closed ConnerEvans closed 3 years ago

ConnerEvans commented 3 years ago

I have attached a screenshot of the output from the code below and of the game as its shown on ESPN. The safety shown in line 3 of the output adds two points to DAL when the text description and other sources say that JAX got the safety.

pbp %>% filter(game_id == '2014_10_DAL_JAX', between(play_id, 3250, 3350)) %>% select(posteam, desc, ep, epa, total_home_score, total_away_score) %>% View
Screen Shot 2021-03-06 at 10 18 37 AM Screen Shot 2021-03-06 at 10 18 58 AM
ConnerEvans commented 3 years ago

The link for the ESPN page is https://www.espn.com/nfl/game?gameId=400554358

mrcaseb commented 3 years ago

This is weird as we had fixed all score problems in #154. This must be introduced with the latest posteam changes.

TheMathNinja commented 3 years ago

Whoa does this mean that team designations had been fixed at one point for fumble recoverer, interceptor, sacker, etc. in the JAX games from 2011-2015? Or is that a separate issue? Because I have been treating this issue as intractable in my code for the last year.

ConnerEvans commented 3 years ago

Last one from what I can tell:

pbp %>% filter(game_id == '2013_01_KC_JAX', between(play_id, 230, 330)) %>% select(home_team, away_team, posteam, desc, ep, epa, total_home_score, total_away_score) %>% View
Screen Shot 2021-03-07 at 9 25 27 AM

https://www.espn.com/nfl/game?gameId=330908030

Screen Shot 2021-03-07 at 9 25 55 AM
guga31bb commented 3 years ago
get_season <- function(s) {

  nflfastR::load_pbp(s) %>%
    group_by(game_id) %>%
    summarise(
      home_pbp = max(total_home_score),
      home_lee = max(home_score),
      away_pbp = max(total_away_score),
      away_lee = max(away_score)
    ) %>%
    filter(
      !(home_pbp == home_lee & away_pbp == away_lee)
    ) %>%
    ungroup() %>%
    return()

}

bad_games <- map_df(1999 : 2020, ~{get_season(.x)})

> bad_games
# A tibble: 15 x 5
   game_id         home_pbp home_lee away_pbp away_lee
   <chr>              <dbl>    <int>    <dbl>    <int>
 1 2000_11_OAK_DEN       24       27       36       24 <-- duplicated plays, can't fix
 2 2001_09_CIN_JAX       28       30       15       13 *
 3 2001_16_KC_JAX        24       26       32       30 *
 4 2002_08_HOU_JAX       17       19       23       21 *
 5 2004_11_TEN_JAX       13       15       20       18 *
 6 2004_14_CHI_JAX       20       22        5        3 *
 7 2006_14_IND_JAX       38       44       23       17 <-- manually fixed
 8 2007_17_JAX_HOU       30       42       40       28 <-- manually fixed
 9 2008_09_JAX_CIN       27       21       13       19 <-- manually fixed
10 2008_12_MIN_JAX       10       12       32       30 *
11 2009_15_IND_JAX       37       31       29       35 <-- manually fixed
12 2010_15_JAX_IND       28       34       30       24
13 2013_01_KC_JAX         0        2       30       28 *
14 2014_10_DAL_JAX       15       17       33       31 *
15 2015_15_KC_BAL        14       14       34       38 <-- Lee's file is wrong here

Looks like a bunch of Jags games being messed up *= fixed so far (just has a safety wrong)