nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

Added calculate_team_stats() function in R/aggregate_team_stats.R #465

Open mscoop16 opened 4 months ago

mscoop16 commented 4 months ago

Overview

Implemented the calculate_team_stats() function as requested in issue #342. The new function is an adapted version of the exisiting calculate_player_stats() function in R/aggregate_game_stats.R. Each of the columns it built up from the passed in pbp dataframe such that the data is consistent with that of the rest of nflfastR.

Changes Made

Context

This feature addresses the request in issue #342. As per this request thread, the values are built ground-up from nflfastR pbp data. In addition, one of the main points in this thread is to incorporate drive-specific statistics. This is implemented in this PR and covered in the details section.

Function Details

The new calculate_team_stats() function was built as an adaptation of the existing calculate_player_stats() function, which uses the "dplyr" package to manipulate the passed-in pbp dataframe and aggregate player-specific statistics. In the new function, this is done on the team level. Most of the columns built by the calculate_team_stats() function are the same as those built by the calculate_player_stats() with a few additions and exceptions which are listed below:

Column Changes

Testing

As of now, I have only done manual testing using pro-football-reference statistics as ground truth numbers. Excluding discrepancies caused by my implementation details (only notable one is PFR uses team passing yards as passing_yards - sack_yards which I did NOT do) there are only a couple of things to note. For 2023 data, all of the numbers from my initial observations appear to be correct except the per-drive statistics (though this may also be due to how I decided to calculate them vs how PFR does). In older seasons (like 1999), slight discrepancies have appeared. I have done numerous reviews of my own code and am continuing to search for possible bugs.

Notes for Reviewers