nflverse / nflreadr

Efficiently download nflverse data
https://nflreadr.nflverse.com/
Other
62 stars 12 forks source link

[BUG] Inconsistent totals for LAR and non-equal totals for yardage. #245

Closed johnathan-o-h-napier closed 3 months ago

johnathan-o-h-napier commented 3 months ago

Is there an existing issue for this?

Have you installed the latest development version of the package(s) in question?

If this is a data issue, have you tried clearing your nflverse cache?

I have cleared my nflverse cache and the issue persists.

What version of the package do you have?

1.0.3

Describe the bug

I have encountered two separate issues when trying summarize data across each game.

The first I encountered was the fact that the Los Angeles Rams were inconsistently referred to as either LA or LAR in the posteam, defteam, away_team, or home_team columns. Further issue arose when trying summarize data for this team. Some yard totals came to 0 for the entire game. Upon further inspection, 3 rows were being generated for the same game_id: LAR vs other team, other team vs LAR, and LAR vs. LAR. posteam_type are all away when this occurs. This has a multi-pronged effect, the home and away scores and subsequently the posteam and defteam scores get incorrectly computed, some games result in 0 yards, and the correct yardages are computed for a game that shows a team versus itself.

The second issue is in yard totals. Often (>50%) of the time the rushing yards are off by 1-3 yards. I went back to several games to check the official totals. This is noticeable when the passing yards + rushing yards != total yards. I'm not exactly sure why this is occurring.

Reprex

#### Load in Initial Data ####
library(tidyverse)
library(nflverse)
NFL_PBP_DATA <- load_pbp(2023)  # Replace with the desired season
#### Summarize Team Data NFL and Fix ####
NFL_TEAMS_DATA<-NFL_PBP_DATA %>% 
  filter(!is.na(posteam)) %>%
  group_by(game_id,game_date,season_type,week,posteam,defteam) %>%
  mutate(away_team = case_when(away_team=="LA"~"LAR",T~away_team))%>%
  mutate(home_team = case_when(home_team=="LA"~"LAR",T~home_team))%>%
  mutate(posteam = case_when(posteam=="LA"~"LAR",T~posteam))%>%
  mutate(defteam = case_when(defteam=="LA"~"LAR",T~defteam)) %>%
  mutate(play_type = play_type %>% replace_na(" ")) %>%
  summarize(
    YARDS_PASSING_A = sum(yards_gained*(play_type=="pass"), na.rm = TRUE),
    YARDS_RUSHING_A = sum(yards_gained*(play_type=="run"), na.rm = TRUE),
    YARDS_TOTAL_A = sum(yards_gained, na.rm = TRUE),
    YARDS_PER_PASS_A = YARDS_PASSING_A/sum(play_type=="pass", na.rm = TRUE),
    YARDS_PER_RUSH_A = YARDS_RUSHING_A/sum(play_type=="run", na.rm = TRUE),
    POINTS_A = max(away_score*(posteam_type=="away")+home_score*(posteam_type=="home"),na.rm=T),
    POINTS_B = max(away_score*(posteam_type=="home")+home_score*(posteam_type=="away"),na.rm=T)
  ) %>% ungroup %>% arrange(game_id,posteam,defteam)
NFL_TEAMS_DATA %>% filter((YARDS_PASSING_A+YARDS_RUSHING_A)!=YARDS_TOTAL_A)
NFL_TEAMS_DATA %>% filter(posteam==defteam)
NFL_TEAMS_DATA %>% filter(YARDS_TOTAL_A==0)

Expected Behavior

Yard totals to sum together. Consistent usage of team abbreviation for LAR. Non-zero yardage. posteam and defteam to not be equal.

nflverse_sitrep

── System Info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• R version 4.4.0 (2024-04-24 ucrt) • Running under: Windows 11 x64 (build 22631)
── Package Status ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   package installed  cran        dev behind
1   nfl4th     1.0.4 1.0.4 1.0.4.9002    dev
2 nflfastR     4.6.1 4.6.1 4.6.1.9013    dev
3 nflplotR     1.3.1 1.3.1      1.3.1       
4 nflreadr     1.4.1 1.4.1   1.4.1.00       
5 nflseedR     1.2.0 1.2.0 1.2.0.9001    dev
6 nflverse     1.0.3 1.0.3      1.0.3       
── Package Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• No options set for above packages
── Package Dependencies ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• askpass     (1.2.0)    • httr         (1.4.7)    • rstudioapi  (0.16.0)    
• backports   (1.5.0)    • isoband      (0.2.7)    • sass        (0.4.9)     
• base64enc   (0.1-3)    • janitor      (2.2.0)    • scales      (1.3.0)     
• bigD        (0.2.0)    • jquerylib    (0.1.4)    • snakecase   (0.11.1)    
• bitops      (1.0-8)    • jsonlite     (1.8.8)    • stringi     (1.8.4)     
• bslib       (0.8.0)    • juicyjuice   (0.1.0)    • stringr     (1.5.1)     
• cachem      (1.1.0)    • knitr        (1.48)     • sys         (3.4.2)     
• cli         (3.6.3)    • labeling     (0.4.3)    • tibble      (3.2.1)     
• colorspace  (2.1-1)    • lattice      (0.22-6)   • tidyr       (1.3.1)     
• commonmark  (1.9.1)    • lifecycle    (1.0.4)    • tidyselect  (1.2.1)     
• cpp11       (0.4.7)    • listenv      (0.9.1)    • timechange  (0.3.0)     
• crayon      (1.5.3)    • lubridate    (1.9.3)    • tinytex     (0.52)      
• curl        (5.2.1)    • magick       (2.8.4)    • utf8        (1.2.4)     
• data.table  (1.15.4)   • magrittr     (2.0.3)    • V8          (4.4.2)     
• digest      (0.6.36)   • markdown     (1.13)     • vctrs       (0.6.5)     
• dplyr       (1.1.4)    • MASS         (7.3-61)   • viridisLite (0.4.2)     
• evaluate    (0.24.0)   • Matrix       (1.7-0)    • withr       (3.0.1)     
• fansi       (1.0.6)    • memoise      (2.0.1)    • xfun        (0.46)      
• farver      (2.1.2)    • mgcv         (1.9-1)    • xgboost     (1.7.8.1)   
• fastmap     (1.2.0)    • mime         (0.12)     • xml2        (1.3.6)     
• fastrmodels (1.0.2)    • munsell      (0.5.1)    • yaml        (2.3.10)    
• fontawesome (0.5.2)    • nlme         (3.1-165)  • codetools   (0.2-20)    
• fs          (1.6.4)    • openssl      (2.2.0)    • compiler    (4.4.0)     
• furrr       (0.3.1)    • parallelly   (1.38.0)   • graphics    (4.4.0)     
• future      (1.34.0)   • pillar       (1.9.0)    • grDevices   (4.4.0)     
• generics    (0.1.3)    • pkgconfig    (2.0.3)    • grid        (4.4.0)     
• ggpath      (1.0.1)    • progressr    (0.14.0)   • lattice     (0.22-6)    
• ggplot2     (3.5.1)    • proto        (1.0.0)    • MASS        (7.3-60.2)  
• globals     (0.16.3)   • purrr        (1.0.2)    • Matrix      (1.7-0)     
• glue        (1.7.0)    • R6           (2.5.1)    • methods     (4.4.0)     
• gsubfn      (0.7)      • rappdirs     (0.3.3)    • mgcv        (1.9-1)     
• gt          (0.11.0)   • RColorBrewer (1.1-3)    • nlme        (3.1-164)   
• gtable      (0.3.5)    • Rcpp         (1.0.13)   • parallel    (4.4.0)     
• highr       (0.11)     • reactable    (0.4.4)    • splines     (4.4.0)     
• hms         (1.1.3)    • reactR       (0.6.0)    • stats       (4.4.0)     
• htmltools   (0.5.8.1)  • rlang        (1.1.4)    • tools       (4.4.0)     
• htmlwidgets (1.6.4)    • rmarkdown    (2.27)     • utils       (4.4.0)

Screenshots

No response

Additional context

No response

mrcaseb commented 3 months ago

I have triggered a pbp rebuild with a fix in nflfastR yesterday. Could you check again, if the problem still exists? Please make sure to load fresh pbp, either by restarting your session or running nflreadr::clear_cache()

johnathan-o-h-napier commented 3 months ago

That got it. All teams data is now intact. Away/home and pos/def issues for the LA Rams have been resolved.

I also discovered where the discrepancy was in rushing yards. I didn't incorporate quarterback kneels. They do officially count as negative yards to the total yards rushing. I suppose on some level you are "rushing" backwards.

Thanks.