Implemented the calculate_team_stats() function as requested in issue #342. The new function is an adapted version of the exisiting calculate_player_stats() function in R/aggregate_game_stats.R. Each of the columns it built up from the passed in pbp dataframe such that the data is consistent with that of the rest of nflfastR.
Changes Made
Created file aggregate_team_stats.R
Added function calculate_team_stats() to R/aggregate_team_stats.R
Added Roxygen2 comment documentation to aggregate_team_stats.R
Context
This feature addresses the request in issue #342. As per this request thread, the values are built ground-up from nflfastR pbp data. In addition, one of the main points in this thread is to incorporate drive-specific statistics. This is implemented in this PR and covered in the details section.
Function Details
The new calculate_team_stats() function was built as an adaptation of the existing calculate_player_stats() function, which uses the "dplyr" package to manipulate the passed-in pbp dataframe and aggregate player-specific statistics. In the new function, this is done on the team level. Most of the columns built by the calculate_team_stats() function are the same as those built by the calculate_player_stats() with a few additions and exceptions which are listed below:
Column Changes
No columns built for certain EPA statistics which are more relevant to players than teams (receiving EPA, dakota, ...)
No columns built for certain statistics which represent a player's proportion of a teams total (target share, air yards share, WOPR)
Columns built for team drive-specific statistics (include but not limited to plays per drive, yards per drive, red zone percentage, percentage of drives ending with a score, etc.)
Testing
As of now, I have only done manual testing using pro-football-reference statistics as ground truth numbers. Excluding discrepancies caused by my implementation details (only notable one is PFR uses team passing yards as passing_yards - sack_yards which I did NOT do) there are only a couple of things to note. For 2023 data, all of the numbers from my initial observations appear to be correct except the per-drive statistics (though this may also be due to how I decided to calculate them vs how PFR does). In older seasons (like 1999), slight discrepancies have appeared. I have done numerous reviews of my own code and am continuing to search for possible bugs.
Notes for Reviewers
Any and all feedback is welcome. I am available for discussion in the nflverse Discord, X, or any other preferred communication method.
My implementation assumes that each team has the same value for 'posteam' in the entire pbp dataframe (i.e. the team abbreviation is consistent). Code would need to be added to account for when this is not the case if that is the desired functionality.
Overview
Implemented the calculate_team_stats() function as requested in issue #342. The new function is an adapted version of the exisiting calculate_player_stats() function in R/aggregate_game_stats.R. Each of the columns it built up from the passed in pbp dataframe such that the data is consistent with that of the rest of nflfastR.
Changes Made
Context
This feature addresses the request in issue #342. As per this request thread, the values are built ground-up from nflfastR pbp data. In addition, one of the main points in this thread is to incorporate drive-specific statistics. This is implemented in this PR and covered in the details section.
Function Details
The new calculate_team_stats() function was built as an adaptation of the existing calculate_player_stats() function, which uses the "dplyr" package to manipulate the passed-in pbp dataframe and aggregate player-specific statistics. In the new function, this is done on the team level. Most of the columns built by the calculate_team_stats() function are the same as those built by the calculate_player_stats() with a few additions and exceptions which are listed below:
Column Changes
Testing
As of now, I have only done manual testing using pro-football-reference statistics as ground truth numbers. Excluding discrepancies caused by my implementation details (only notable one is PFR uses team passing yards as passing_yards - sack_yards which I did NOT do) there are only a couple of things to note. For 2023 data, all of the numbers from my initial observations appear to be correct except the per-drive statistics (though this may also be due to how I decided to calculate them vs how PFR does). In older seasons (like 1999), slight discrepancies have appeared. I have done numerous reviews of my own code and am continuing to search for possible bugs.
Notes for Reviewers