nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
425 stars 52 forks source link

calculate_series_conversion_rates(weekly = FALSE) returns `NA` for subests without a series on offense or defense #416

Closed rickstarblazer closed 1 year ago

rickstarblazer commented 1 year ago

Is there an existing issue for this?

Have you installed the latest development version of the package(s) in question?

What version of the package do you have?

4.5.1

Describe the bug

have pbp data with filter(yardline_100 <= 20) . When I run calculate_series_conversion_rates() on this data set, sometimes returns NAs for all the defensive conversion rates.

I think this happens when a series contains a turnover that leads to a touchback.

Reprex

redzone_pbp <- load_pbp(2022) %>%
  filter(season_type == "REG") %>%
  filter(!is.na(posteam) & (rush == 1 | pass == 1)) %>%
  filter(yardline_100 <= 20)

rz_conv_rt <- calculate_series_conversion_rates(redzone_pbp, week = FALSE) %>%
  select(season, team, off_n, def_n, off_scr, def_scr, off_td, def_td, off_fg, def_fg, off_to, def_to)

Expected Behavior

The defense conversion feilds (def_n, def_scr, def_td) should return valid values

nflverse_sitrep

# A tibble: 32 × 14
   season team  off_n def_n off_scr def_scr off_td def_td off_fg def_fg off_to  def_to off_rz_score def_rz_score
    <int> <chr> <int> <int>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>        <dbl>        <dbl>
 1   2022 ARI      63    99   0.667   0.787  0.412  0.531  0.233  0.177 0.1     0.0353        0.645        0.709
 2   2022 ATL      80    75   0.726   0.690  0.386  0.454  0.209  0.235 0.0648  0.0755        0.595        0.689
 3   2022 BAL      77    NA   0.560  NA      0.342 NA      0.331 NA     0.0888 NA             0.673       NA    
 4   2022 BUF      87    67   0.710   0.603  0.412  0.372  0.126  0.263 0.164   0.134         0.538        0.635
 5   2022 CAR      60    84   0.654   0.596  0.479  0.339  0.310  0.304 0.0359  0.0804        0.789        0.644
 6   2022 CHI      63    86   0.658   0.711  0.491  0.492  0.209  0.212 0.114   0.0765        0.700        0.704
 7   2022 CIN      87    61   0.811   0.612  0.444  0.400  0.140  0.292 0.0486  0.0958        0.584        0.692
 8   2022 CLE      74    82   0.625   0.706  0.393  0.439  0.214  0.194 0.161   0.1           0.607        0.632
 9   2022 DAL      79    NA   0.760  NA      0.509 NA      0.208 NA     0.0318 NA             0.717       NA    
10   2022 DEN      48    60   0.713   0.584  0.53   0.345  0.233  0.364 0.0533  0.0524        0.763        0.709

Screenshots

No response

Additional context

No response

mrcaseb commented 1 year ago

Thanks for this submission. It's indeed a bug.

If one team doesn't have any play on offense in the pbp data, e.g. the 2022 Broncos in week 13, the opponent defense doesn't have any series data to compute. Summarising with weekly = FALSE ended up in NAs.

grafik

This also revealed that we need a full_join of defense data to offense data, because the Denver Offense didn't do anything in the redzone in that game.

Will be fixed soon