Closed alecglen closed 2 months ago
Started thinking about this and now I remember why we didn't do that in the first place. It's because we have to decide which weeks to include in the season stats, i.e. if we do regular season only or include playoffs. I don't think we should release separate data for each case of
I think we should either do REG only, REG+POST combined, or combine REG, POST, REG+POST in one dataframe
That makes sense; I think that was one of the assumptions the two functions differ on currently as well. I think it makes the most sense to combine them all in one dataframe; then the clients can choose which subset to grab for their use case.
That makes sense; I think that was one of the assumptions the two functions differ on currently as well. I think it makes the most sense to combine them all in one dataframe; then the clients can choose which subset to grab for their use case.
Yep, that is what I have implemented in #94. Currently running a test for the 2023 season. If this issue is closed, this means that the test was successful and season level summaries will be available. However, due to backend changes I will only compute season level summaries for the 2016+ seasons. Older seasons will have to wait until we have updated raw json data with fixes to buggy JAX home games
@alecglen
The pipeline is ready now. In the player stats release there are the following combined files
There are also separate files for each season but I assume you can work with the above ones
Currently, nflverse-data hosts the
player_stats
feed containing player records on a per-game basis. If you want that data aggregated across a season, though, you need to go back to nflfastR and usecalculate_player_stats()
.nfl_data_py provides a semi-equivalent function -
import_seasonal_data()
- that pulls and aggregates the weekly data. However, keeping the aggregation logic synced between the two functions in different languages is redundant and error-prone. If the result ofcalculate_player_stats(weekly = FALSE)
were published in the feed or in its own separate feed, it'd ensure everyone is getting the same final dataset regardless of package used. It also sets nflreadR up to surface the dataset.