nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
424 stars 51 forks source link

Play Clock Oddities #250

Closed ajreinhard closed 2 years ago

ajreinhard commented 3 years ago

Sebastian suggested that I open a GitHub issue for play clock stuff, so I'm just going to spill everything I have here so far.

My initial concern was that, in creating this tweet, I found that the pass probability over expected when play_clock == 0 was behaving differently than the trend. I looked into it a little more and found that there were fewer plays than I expected that had zero seconds on the play clock in most seasons. There are also a handful of games that don't have anything other than zeros for the play clock, which to me signals that there may be some kind of error or missing data on the NFL's side.

From what I can tell, there could be several eras of play clock data recorded:

I personally only feel comfortable using the play clock from 2015-2020 where there are no zeros and the clock is less than 25 seconds, but I'm not sure if there is anything you two would like to do about this.

ID'ing games where there are mostly zeros:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.3
#> Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
#> when loading 'dplyr'
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'tibble' was built under R version 3.6.3
#> Warning: package 'dplyr' was built under R version 3.6.3
library(nflfastR)

pbp_df <- load_pbp(2014:2020)
#> i It is recommended to use parallel processing when trying to load multiple seasons.
#>   Please consider running `future::plan("multisession")`!
#>   Will go on sequentially...

pbp_df %>% 
  filter(!is.na(down) & !is.na(posteam) & pass + rush == 1) %>% 
  group_by(game_id) %>% 
  summarise(
    tot_plays = n(),
    play_clock_zero_pct = mean(ifelse(play_clock == 0, 1, 0))
  ) %>% 
  arrange(-play_clock_zero_pct)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 1,871 x 3
#>    game_id         tot_plays play_clock_zero_pct
#>    <chr>               <int>               <dbl>
#>  1 2014_02_SEA_SD        122               1    
#>  2 2014_04_JAX_SD        129               1    
#>  3 2014_05_NYJ_SD        131               1    
#>  4 2014_07_KC_SD         126               1    
#>  5 2014_11_OAK_SD        129               1    
#>  6 2014_12_STL_SD        127               1    
#>  7 2015_04_STL_ARI       124               1    
#>  8 2015_09_NYG_TB        135               1    
#>  9 2016_03_ATL_NO        142               1    
#> 10 2015_10_NO_WAS        119               0.924
#> # ... with 1,861 more rows

Distribution of play clocks by season:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.3
#> Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
#> when loading 'dplyr'
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'tibble' was built under R version 3.6.3
#> Warning: package 'dplyr' was built under R version 3.6.3
library(nflfastR)

pbp_df <- load_pbp(2012:2020)
#> i It is recommended to use parallel processing when trying to load multiple seasons.
#>   Please consider running `future::plan("multisession")`!
#>   Will go on sequentially...

pbp_df %>% 
  filter(!is.na(down) & !is.na(posteam) & pass + rush == 1) %>% 
  mutate(
    play_clock = as.numeric(play_clock),
    play_clock = ifelse(play_clock > 40, '>40', play_clock),
    play_clock = factor(play_clock, c(0:40, '>40'))
  ) %>% 
  group_by(season, play_clock) %>% 
  summarise(n = n(), .groups = 'drop') %>% 
  group_by(season) %>% 
  mutate(freq = n / sum(n)) %>% 
  ggplot(aes(x = play_clock, y = freq)) +
  facet_wrap(~season, ncol = 1, strip.position = 'left') +
  scale_x_discrete(breaks = c(0, seq(5, 40, 5), '>40')) +
  geom_bar(stat = 'identity', alpha = 0.7) +
  theme_light() +
  theme(
    strip.placement = 'inside',
    panel.grid.minor.y = element_blank()
  )

guga31bb commented 2 years ago

Can't do anything about this on our end so closing