sportsdataverse / cfbfastR

An R package to quickly obtain clean and tidy college football play by play data
https://cfbfastR.sportsdataverse.org
Other
74 stars 8 forks source link

Inconsistent formatting of data returned by `cfbd_game_team_stats()` #75

Closed john-b-edwards closed 2 years ago

john-b-edwards commented 2 years ago

When calling cfbfastR::cfbd_game_team_stats(), some season/week/season type combinations return long data, others return wide data.

cfbfastR::cfbd_game_team_stats(2004, 1, "regular")
#> ── Team stats data from CollegeFootballData.com ────────────── cfbfastR 1.9.0 ──
#> ℹ Data updated: 2022-07-23 17:17:58 PDT
#> # A tibble: 4 × 78
#>    game_id school confe…¹ home_…² oppon…³ oppon…⁴ points total…⁵ net_p…⁶ compl…⁷
#>      <int> <chr>  <chr>   <chr>   <chr>   <chr>    <int> <chr>   <chr>   <chr>  
#> 1   2.42e8 Virgi… ACC     home    USC     Pac-10      13 294     180     14-29  
#> 2   2.42e8 USC    Pac-10  away    Virgin… ACC         24 373     284     19-29  
#> 3   2.42e8 India… MVFC    away    Miami … Mid-Am…      0 204     137     18-33  
#> 4   2.42e8 Miami… Mid-Am… home    Indian… MVFC        49 454     292     24-37  
#> # … with 68 more variables: passing_tds <chr>, yards_per_pass <chr>,
#> #   passes_intercepted <chr>, interception_yards <chr>, interception_tds <chr>,
#> #   rushing_attempts <chr>, rushing_yards <chr>, rush_tds <chr>,
#> #   yards_per_rush_attempt <chr>, first_downs <chr>, third_down_eff <chr>,
#> #   fourth_down_eff <chr>, punt_returns <chr>, punt_return_yards <chr>,
#> #   punt_return_tds <chr>, kick_return_yards <lgl>, kick_return_tds <lgl>,
#> #   kick_returns <lgl>, kicking_points <chr>, fumbles_recovered <chr>, …
#> # ℹ Use `colnames()` to see all variable names
cfbfastR::cfbd_game_team_stats(2004, 2, "regular")
#> # A tibble: 2,840 × 7
#>           id school conference homeAway points category           stat 
#>        <int> <chr>  <chr>      <chr>     <int> <chr>              <chr>
#>  1 242482426 Duke   ACC        away         12 fumblesRecovered   3    
#>  2 242482426 Duke   ACC        away         12 rushingTDs         1    
#>  3 242482426 Duke   ACC        away         12 passingTDs         0    
#>  4 242482426 Duke   ACC        away         12 kickingPoints      6    
#>  5 242482426 Duke   ACC        away         12 firstDowns         14   
#>  6 242482426 Duke   ACC        away         12 thirdDownEff       2-13 
#>  7 242482426 Duke   ACC        away         12 fourthDownEff      1-2  
#>  8 242482426 Duke   ACC        away         12 totalYards         265  
#>  9 242482426 Duke   ACC        away         12 netPassingYards    115  
#> 10 242482426 Duke   ACC        away         12 completionAttempts 13-22
#> # … with 2,830 more rows
#> # ℹ Use `print(n = ...)` to see more rows
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9         pillar_1.8.0       compiler_4.2.1     highr_0.9         
#>  [5] tools_4.2.1        digest_0.6.29      lattice_0.20-45    nlme_3.1-157      
#>  [9] jsonlite_1.8.0     lubridate_1.8.0    evaluate_0.15      lifecycle_1.0.1   
#> [13] tibble_3.1.7       mgcv_1.8-40        pkgconfig_2.0.3    rlang_1.0.4       
#> [17] Matrix_1.4-1       reprex_2.0.1       cli_3.3.0          DBI_1.1.3         
#> [21] rstudioapi_0.13    curl_4.3.2         yaml_2.3.5         xfun_0.31         
#> [25] fastmap_1.1.0      janitor_2.1.0      httr_1.4.3         cfbfastR_1.9.0    
#> [29] withr_2.5.0        dplyr_1.0.9        stringr_1.4.0      knitr_1.39        
#> [33] generics_0.1.3     fs_1.5.2           vctrs_0.4.1        nnet_7.3-17       
#> [37] grid_4.2.1         tidyselect_1.1.2   snakecase_0.11.0   glue_1.6.2        
#> [41] data.table_1.14.2  R6_2.5.1           fansi_1.0.3        rmarkdown_2.14    
#> [45] purrr_0.3.4        tidyr_1.2.0        magrittr_2.0.3     splines_4.2.1     
#> [49] htmltools_0.5.3    ellipsis_0.3.2     assertthat_0.2.1   utf8_1.2.2        
#> [53] stringi_1.7.8      RcppParallel_5.1.5
Kazink36 commented 2 years ago

I'm not able to replicate this issue. It appears something is going wrong at the pivot but I'm not sure why that would happen for one week and not the other and why there isn't a message in the console

https://github.com/sportsdataverse/cfbfastR/blob/main/R/cfbd_games.R#L1368-L1372

cfbfastR::cfbd_game_team_stats(2004, 2, "regular")
#> ── Team stats data from CollegeFootballData.com ────────────── cfbfastR 1.9.0 ──
#> ℹ Data updated: 2022-08-17 15:05:39 MST
#> # A tibble: 120 × 78
#>    game_id school confe…¹ home_…² oppon…³ oppon…⁴ points total…⁵ net_p…⁶ compl…⁷
#>      <int> <chr>  <chr>   <chr>   <chr>   <chr>    <int> <chr>   <chr>   <chr>  
#>  1  2.42e8 Duke   ACC     away    Navy    FBS In…     12 265     115     13-22  
#>  2  2.42e8 Navy   FBS In… home    Duke    ACC         27 430     134     8-9    
#>  3  2.42e8 North… Mid-Am… away    Maryla… ACC         20 337     228     19-37  
#>  4  2.42e8 Maryl… ACC     home    Northe… Mid-Am…     23 367     198     12-22  
#>  5  2.42e8 Willi… Atlant… away    North … ACC         38 442     322     23-41  
#>  6  2.42e8 North… ACC     home    Willia… Atlant…     49 575     236     14-24  
#>  7  2.42e8 Clems… ACC     home    Wake F… ACC         37 371     297     20-41  
#>  8  2.42e8 Wake … ACC     away    Clemson ACC         30 410     182     10-25  
#>  9  2.42e8 Richm… Atlant… away    NC Sta… ACC          0 167     51      10-27  
#> 10  2.42e8 NC St… ACC     home    Richmo… Atlant…     42 403     237     24-32  
#> # … with 110 more rows, 68 more variables: passing_tds <chr>,
#> #   yards_per_pass <chr>, passes_intercepted <chr>, interception_yards <chr>,
#> #   interception_tds <chr>, rushing_attempts <chr>, rushing_yards <chr>,
#> #   rush_tds <chr>, yards_per_rush_attempt <chr>, first_downs <chr>,
#> #   third_down_eff <chr>, fourth_down_eff <chr>, punt_returns <chr>,
#> #   punt_return_yards <chr>, punt_return_tds <chr>, kick_return_yards <lgl>,
#> #   kick_return_tds <lgl>, kick_returns <lgl>, kicking_points <chr>, …
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
john-b-edwards commented 2 years ago

Odd, looks like it's been fixed internally? Either way, seems resolved.

Kazink36 commented 2 years ago

Week 1 of 2022 has the same issue, reopening to investigate