nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

Resolving inconsistencies between columns in `load_player_stats()` #454

Open john-b-edwards opened 8 months ago

john-b-edwards commented 8 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

There are some notable inconsistencies in how player biographical or contextual information is represented for different stat_types in load_player_stats().

nflreadr::load_player_stats(stat_type = "offense") |>
    colnames()
#>  [1] "player_id"                   "player_name"                
#>  [3] "player_display_name"         "position"                   
#>  [5] "position_group"              "headshot_url"               
#>  [7] "recent_team"                 "season"                     
#>  [9] "week"                        "season_type"                
#> [11] "completions"                 "attempts"                   
#> [13] "passing_yards"               "passing_tds"                
#> [15] "interceptions"               "sacks"                      
#> [17] "sack_yards"                  "sack_fumbles"               
#> [19] "sack_fumbles_lost"           "passing_air_yards"          
#> [21] "passing_yards_after_catch"   "passing_first_downs"        
#> [23] "passing_epa"                 "passing_2pt_conversions"    
#> [25] "pacr"                        "dakota"                     
#> [27] "carries"                     "rushing_yards"              
#> [29] "rushing_tds"                 "rushing_fumbles"            
#> [31] "rushing_fumbles_lost"        "rushing_first_downs"        
#> [33] "rushing_epa"                 "rushing_2pt_conversions"    
#> [35] "receptions"                  "targets"                    
#> [37] "receiving_yards"             "receiving_tds"              
#> [39] "receiving_fumbles"           "receiving_fumbles_lost"     
#> [41] "receiving_air_yards"         "receiving_yards_after_catch"
#> [43] "receiving_first_downs"       "receiving_epa"              
#> [45] "receiving_2pt_conversions"   "racr"                       
#> [47] "target_share"                "air_yards_share"            
#> [49] "wopr"                        "special_teams_tds"          
#> [51] "fantasy_points"              "fantasy_points_ppr"         
#> [53] "opponent_team"

nflreadr::load_player_stats(stat_type = "defense") |>
    colnames() 
#>  [1] "season"                        "week"                         
#>  [3] "player_id"                     "player_name"                  
#>  [5] "player_display_name"           "position"                     
#>  [7] "position_group"                "headshot_url"                 
#>  [9] "team"                          "def_tackles"                  
#> [11] "def_tackles_solo"              "def_tackles_with_assist"      
#> [13] "def_tackle_assists"            "def_tackles_for_loss"         
#> [15] "def_tackles_for_loss_yards"    "def_fumbles_forced"           
#> [17] "def_sacks"                     "def_sack_yards"               
#> [19] "def_qb_hits"                   "def_interceptions"            
#> [21] "def_interception_yards"        "def_pass_defended"            
#> [23] "def_tds"                       "def_fumbles"                  
#> [25] "def_fumble_recovery_own"       "def_fumble_recovery_yards_own"
#> [27] "def_fumble_recovery_opp"       "def_fumble_recovery_yards_opp"
#> [29] "def_safety"                    "def_penalty"                  
#> [31] "def_penalty_yards"

nflreadr::load_player_stats(stat_type = "kicking") |>
    colnames()
#>  [1] "season"              "week"                "season_type"        
#>  [4] "team"                "player_name"         "player_id"          
#>  [7] "fg_made"             "fg_missed"           "fg_blocked"         
#> [10] "fg_long"             "fg_att"              "fg_pct"             
#> [13] "pat_made"            "pat_missed"          "pat_blocked"        
#> [16] "pat_att"             "pat_pct"             "fg_made_distance"   
#> [19] "fg_missed_distance"  "fg_blocked_distance" "gwfg_att"           
#> [22] "gwfg_distance"       "gwfg_made"           "gwfg_missed"        
#> [25] "gwfg_blocked"        "fg_made_0_19"        "fg_made_20_29"      
#> [28] "fg_made_30_39"       "fg_made_40_49"       "fg_made_50_59"      
#> [31] "fg_made_60_"         "fg_missed_0_19"      "fg_missed_20_29"    
#> [34] "fg_missed_30_39"     "fg_missed_40_49"     "fg_missed_50_59"    
#> [37] "fg_missed_60_"       "fg_made_list"        "fg_missed_list"     
#> [40] "fg_blocked_list"

stat_type = defense lacks the column season_type for instance, and we have player_display_name and position for defense and offense but not kicking (position = K is assumed but that is not always the case, see Dare Ogunbowale's kicking exploits for example).

Describe the solution you'd like

I think we should standardize how biographical and contextual information for player stats is represented in these columns.

Describe alternatives you've considered

No response

Additional context

No response

mrcaseb commented 8 months ago

Transferred to nflfastR as we should resolve this directly in the underlying functions

mrcaseb commented 8 months ago

Cross checking this and it seems like some of this has already been resolved in nflfastR. I guess we need to trigger the workflow to rebuild all data in nflverse-pbp at some point

season_type is currently missing in def so we need to add this before rebuild

pbp <- nflreadr::load_pbp(2023)

off <- nflfastR::calculate_player_stats(pbp, weekly = TRUE)
def <- nflfastR::calculate_player_stats_def(pbp, weekly = TRUE)
kick <- nflfastR::calculate_player_stats_kicking(pbp, weekly = TRUE)

colnames(off)
#>  [1] "player_id"                   "player_name"                
#>  [3] "player_display_name"         "position"                   
#>  [5] "position_group"              "headshot_url"               
#>  [7] "recent_team"                 "season"                     
#>  [9] "week"                        "season_type"                
#> [11] "opponent_team"               "completions"                
#> [13] "attempts"                    "passing_yards"              
#> [15] "passing_tds"                 "interceptions"              
#> [17] "sacks"                       "sack_yards"                 
#> [19] "sack_fumbles"                "sack_fumbles_lost"          
#> [21] "passing_air_yards"           "passing_yards_after_catch"  
#> [23] "passing_first_downs"         "passing_epa"                
#> [25] "passing_2pt_conversions"     "pacr"                       
#> [27] "dakota"                      "carries"                    
#> [29] "rushing_yards"               "rushing_tds"                
#> [31] "rushing_fumbles"             "rushing_fumbles_lost"       
#> [33] "rushing_first_downs"         "rushing_epa"                
#> [35] "rushing_2pt_conversions"     "receptions"                 
#> [37] "targets"                     "receiving_yards"            
#> [39] "receiving_tds"               "receiving_fumbles"          
#> [41] "receiving_fumbles_lost"      "receiving_air_yards"        
#> [43] "receiving_yards_after_catch" "receiving_first_downs"      
#> [45] "receiving_epa"               "receiving_2pt_conversions"  
#> [47] "racr"                        "target_share"               
#> [49] "air_yards_share"             "wopr"                       
#> [51] "special_teams_tds"           "fantasy_points"             
#> [53] "fantasy_points_ppr"
colnames(def)
#>  [1] "season"                        "week"                         
#>  [3] "player_id"                     "player_name"                  
#>  [5] "player_display_name"           "position"                     
#>  [7] "position_group"                "headshot_url"                 
#>  [9] "team"                          "def_tackles"                  
#> [11] "def_tackles_solo"              "def_tackles_with_assist"      
#> [13] "def_tackle_assists"            "def_tackles_for_loss"         
#> [15] "def_tackles_for_loss_yards"    "def_fumbles_forced"           
#> [17] "def_sacks"                     "def_sack_yards"               
#> [19] "def_qb_hits"                   "def_interceptions"            
#> [21] "def_interception_yards"        "def_pass_defended"            
#> [23] "def_tds"                       "def_fumbles"                  
#> [25] "def_fumble_recovery_own"       "def_fumble_recovery_yards_own"
#> [27] "def_fumble_recovery_opp"       "def_fumble_recovery_yards_opp"
#> [29] "def_safety"                    "def_penalty"                  
#> [31] "def_penalty_yards"
colnames(kick)
#>  [1] "season"              "week"                "season_type"        
#>  [4] "player_id"           "team"                "player_name"        
#>  [7] "player_display_name" "position"            "position_group"     
#> [10] "headshot_url"        "fg_made"             "fg_att"             
#> [13] "fg_missed"           "fg_blocked"          "fg_long"            
#> [16] "fg_pct"              "fg_made_0_19"        "fg_made_20_29"      
#> [19] "fg_made_30_39"       "fg_made_40_49"       "fg_made_50_59"      
#> [22] "fg_made_60_"         "fg_missed_0_19"      "fg_missed_20_29"    
#> [25] "fg_missed_30_39"     "fg_missed_40_49"     "fg_missed_50_59"    
#> [28] "fg_missed_60_"       "fg_made_list"        "fg_missed_list"     
#> [31] "fg_blocked_list"     "fg_made_distance"    "fg_missed_distance" 
#> [34] "fg_blocked_distance" "pat_made"            "pat_att"            
#> [37] "pat_missed"          "pat_blocked"         "pat_pct"            
#> [40] "gwfg_att"            "gwfg_distance"       "gwfg_made"          
#> [43] "gwfg_missed"         "gwfg_blocked"
mrcaseb commented 8 months ago

Season type has been added to defense stats. We could define a consistent column order to finish this off

john-b-edwards commented 1 month ago

Seems like nflverse/nflreadr#237 falls under this scope

mrcaseb commented 1 month ago

I started a fresh player stats approach in https://github.com/nflverse/nflfastR/pull/470 which should resolve all of this by computing all stats in on function