nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

Rethink Player Stats Logic #445

Open mrcaseb opened 10 months ago

mrcaseb commented 10 months ago

We see new issues in #444 and already had lots of problems caused by the fact that we summarize play stats into a tidy form.

I think for player stats, we should make a transition to a new concept.

We load raw game data, extract the play stats and row bind them. This will make it pretty easy and straightforward to correctly summarize player stats and team stats.

guga31bb commented 10 months ago

Yep this makes more sense

mrcaseb commented 2 months ago

Thought about this for a while and there are several design decisions to be made.

  1. Function name: I would like to wrap all stats in calculate_stats. This function should output everything that the three currently implemented player stats functions return
  2. Functions args: That's a bit tricky. If we want to keep model based stats like epa, we need the actual pbp data but most stats will be computed using playstats data from this release. If pbp is the function argument, then we have to download playstats and filter down to the game ids in pbp. If we use seasons as argument, we have to download both pbp and playstats for those seasons.
  3. Summary level: currently we either summarise all or weekly. We do not care about multiple seasons and leave it to the user to supply relevant pbp data subsets. Generally, I would like to apply the pattern in the kicking stats function https://github.com/nflverse/nflfastR/blob/d4c5c43f2c3d8bde1caf7dd5d2fd3d5c317ce99c/R/aggregate_game_stats_kicking.R#L31-L35 where we set the grouping variables dynamically with rlang::data_syms()
  4. Allow team stats as well? It's mostly just a different grouping variable

My current thought process is something like this :

calculate_stats <- function(seasons = nflreadr::most_recent_season(),
                            summary_level = c("season", "week"),
                            stat_type = c("player", "team")){

  pbp <- nflreadr::load_pbp(seasons = seasons)

  playstats <- # a load_pbp pendant for playstats from https://github.com/nflverse/nflverse-pbp/releases/download/playstats/play_stats_{season}.rds

  # set grouping variables based off summary_level and stat_type
  # 
  # sumarise epa stats and dakota using pbp
  # 
  # summarise all other stats using playstats. That's a big call to summarise
  # where we create all sorts of stats with the various stat IDs
  # 
  # load player data if stat_type is player to joing player info
  # 
  # join everything 

}
guga31bb commented 2 months ago

This all sounds logical to me!