nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

"Abnormal Plays" overwrite `passer`, `receiver`, and `rusher` on plays with penalties #434

Closed mrcaseb closed 11 months ago

mrcaseb commented 11 months ago
pbp <- nflreadr::load_pbp()
d <- pbp |> 
  dplyr::filter(game_id == "2023_04_BAL_CLE", play_id == 792) |> 
  dplyr::select(desc, passer, passer_player_name)

d$desc
#> [1] "(4:18) (Shotgun) 68-M.Dunn reported in as eligible. Direct snap to 88-H.Bryant.  17-D.Thompson-Robinson pass incomplete deep right to 2-A.Cooper.\r\nPENALTY on BAL-21-B.Stephens, Defensive Pass Interference, 37 yards, enforced at CLE 44 - No Play."
d$passer
#> [1] NA
d$passer_player_name
#> [1] NA

There is no passer in this play, because the abnormal play parser detects the play.

# look for First[period or space]Last[maybe - or ' in last][maybe more letters in last][maybe Jr. or II or IV]
big_parser <- "(?<=)[A-Z][A-z]*+(\\.|\\s)+[A-Z][A-z]*+\\'*\\-*[A-Z]*+[a-z]*+(\\s((Jr.)|(Sr.)|I{2,3})|(IV))?"
# maybe some spaces and leters, and then pass / sack / scramble
pass_finder <- "(?=\\s*[a-z]*+\\s*(( pass)|(sack)|(scramble)))"
# weird play finder
abnormal_play <- "(Lateral)|(lateral)|(pitches to)|(Direct snap to)|(New quarterback for)|(Aborted)|(backwards pass)|(Pass back to)|(Flea-flicker)"

stringr::str_extract(d$desc, glue::glue('{big_parser}{pass_finder}'))
#> [1] "D.Thompson-Robinson"
stringr::str_detect(d$desc, glue::glue('{abnormal_play}'))
#> [1] TRUE

I wonder if we should change the abnormal play adjustment to replace with the *_player_name variable only if it is not NA

https://github.com/nflverse/nflfastR/blob/db0214bf79398aa64add52393a119b3347543295/R/helper_additional_functions.R#L88-L91

So something like

passer = dplyr::case_when( 
   stringr::str_detect(.data$desc, glue::glue('{abnormal_play}')) & !is.na(.data$passer_player_name) ~ .data$passer_player_name, 
   TRUE ~ .data$passer 
 )

Thoughts @guga31bb?