nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

Penalties on PATs #18

Closed ajreinhard closed 4 years ago

ajreinhard commented 4 years ago

It looks like penalties that occur on extra points are not being treated as extra point plays. I found 29 such plays from the 2019 season (see code below) that resulted in meaningful negative EPA for the offense, even when the penalty was called on the defense.


pbp_df <- readRDS(url('https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2019.rds'))

no_drive <- pbp_df %>% 
  arrange(season, week, game_id, qtr, -quarter_seconds_remaining) %>% 
  filter((lead(two_point_attempt)==1 | lead(extra_point_attempt)==1) & lag(touchdown==1) & penalty==1) %>% 
  select(game_id, play_id, epa, desc)

##some PATs do not have drive in them. I found one PAT penalty that was out of order (game_id=='2019_13_NE_HOU' & play_id %in% c(3100, 3123, 3150))
pbp_df %>% 
  arrange(season, week, game_id, qtr, -quarter_seconds_remaining, drive) %>% 
  filter((lead(two_point_attempt)==1 | lead(extra_point_attempt)==1) & lag(touchdown==1) & penalty==1) %>% 
  select(game_id, play_id, epa, desc) %>% 
  full_join(no_drive) %>% 
  View

It looks like there may be a similar problem with penalties on kickoffs, but it could just be due to the order being wrong in the example I found: (game_id=='2019_12_IND_HOU' & play_id %in% c(2521,2544))

guga31bb commented 4 years ago

Thanks for the catch, will look into this! I'm guessing we need to fix something in the function that applies ep to the given dataframe

guga31bb commented 4 years ago

There are two separate issues:

  1. EP / EPA are wrong on PAT / kickoff plays with penalties
  2. Some plays are out of order (eg PAT after kickoff)

I think both have been fixed and am re-scraping now to update the data repo. For 1. above, I'm setting these plays to NA EP and EPA, because we have no way to calculate EP on these plays (we don't have a model for, for example, EP of a kickoff from a different yard line than normal).

I'll update and close this once the data repo has been updated

guga31bb commented 4 years ago

Okay should be fixed in the data repo. Please let us know if there are any more issues!

ajreinhard commented 4 years ago

Looks like there still might be issues with the EP calculation on kickoffs where the kickoff penalty is applied on the next down. I found 17 kickoffs that begin with EP > 2.

pbp_df %>% 
  filter(kickoff_attempt==1 | grepl('kick formation', desc, ignore.case = T)) %>% 
  select(game_id, play_type, play_id, ep, epa, desc) %>% 
  arrange(-ep) %>%
  View

Additionally, a handful of kickoffs are coming in as playtype = 'qb_kneel'. There seem to be a few plays where the description for a touchback includes the returned kneeling. Should be simple enough to just change line 233 of helper_add_nflscrapr_mutations.R to only exclude any play_type = 'kickoff' which has already been created. qb_kneel = stringr::str_detect(play_description, " kneels ") %>% as.numeric()

guga31bb commented 4 years ago

Thanks for finding these. The kneel thing has been fixed, and most of the EP calculations on kickoffs with penalties, but there's at least one remaining with very weird values caused by plays being out of order that I'm not sure how to fix. Currently re-scraping to update the data repo.

guga31bb commented 4 years ago

Okay just checked and these have been fixed in the dev branch. There's still 3 kickoffs with EP > 2 but those are the "correct" EP values (playing in a dome with a timeout advantage can get above 2 in unique situations). Will update the data repo after some more bug fixes (still need to check the onside kick thing). Thank you!