Publish seasonal aggregates for player stats

nflverse / nflverse-pbp

builds play by play and player stats for nflverse/nflverse-data

Creative Commons Attribution 4.0 International

289 stars 66 forks source link

Publish seasonal aggregates for player stats #93

Closed alecglen closed 2 months ago

alecglen commented 2 months ago

Currently, nflverse-data hosts the player_stats feed containing player records on a per-game basis. If you want that data aggregated across a season, though, you need to go back to nflfastR and use calculate_player_stats().

nfl_data_py provides a semi-equivalent function - import_seasonal_data() - that pulls and aggregates the weekly data. However, keeping the aggregation logic synced between the two functions in different languages is redundant and error-prone. If the result of calculate_player_stats(weekly = FALSE) were published in the feed or in its own separate feed, it'd ensure everyone is getting the same final dataset regardless of package used. It also sets nflreadR up to surface the dataset.

mrcaseb commented 2 months ago

Started thinking about this and now I remember why we didn't do that in the first place. It's because we have to decide which weeks to include in the season stats, i.e. if we do regular season only or include playoffs. I don't think we should release separate data for each case of

Regular Season only
Postseason separate
Regular + Postseason

I think we should either do REG only, REG+POST combined, or combine REG, POST, REG+POST in one dataframe

alecglen commented 2 months ago

That makes sense; I think that was one of the assumptions the two functions differ on currently as well. I think it makes the most sense to combine them all in one dataframe; then the clients can choose which subset to grab for their use case.

mrcaseb commented 2 months ago

That makes sense; I think that was one of the assumptions the two functions differ on currently as well. I think it makes the most sense to combine them all in one dataframe; then the clients can choose which subset to grab for their use case.

Yep, that is what I have implemented in #94. Currently running a test for the 2023 season. If this issue is closed, this means that the test was successful and season level summaries will be available. However, due to backend changes I will only compute season level summaries for the 2016+ seasons. Older seasons will have to wait until we have updated raw json data with fixes to buggy JAX home games

mrcaseb commented 2 months ago

@alecglen

The pipeline is ready now. In the player stats release there are the following combined files

There are also separate files for each season but I assume you can work with the above ones