Closed isaactpetersen closed 3 weeks ago
Seems like the primary issue here (as with #238) is that columns with nflreadr::load_player_stats()
are not named consistent with how other columns are within the nflverse, so we'll focus on that.
We generally advise using gsis_id
(or columns that are the gsis_id
but may be named differently, like nflreadr::load_player_stats() |> dplyr::pull(player_id)
) as the standard for joining on players.
I don't think we will be renaming columns in the near future, for backwards compatibility with existing databases. Happy to take a PR updating the data dictionaries to improve the documentation around player id columns if confusing.
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
I'd like to merge/join variables across data sets. For many of the datasets, there is not a common ID variable to link them. This makes it challenging to merge the datasets.
Describe the solution you'd like
It would be nice for each dataset to have the (relevant) ID variables—with the same spelling—to easily link them to every other dataset. For instance, it would be helpful for every dataset that has players to have a common
player_id
variable (spelled the same way), and for each dataset that has games/weekly data to have agame_id
variable.This suggestion is similar/related to the following issue: https://github.com/nflverse/nflreadr/issues/31
Describe alternatives you've considered
No response
Additional context
As an example, let's say I want to know a player's age for each week of their historical stats (from
load_player_stats()
). To calculate their age at a given game, I would need to know the player's birthday and the date of the game, and to calculate the difference between those dates. None of the datasets has all three sets of variables (stats, birthdate, game date), so I would need to merge the datasets. For instance, I could merge the player_stats dataset with the players dataset to get the player's birthdate, and I could merge the dataset with the game schedules dataset to get the game date. This is currently challenging due to there not being common ID columns to merge them. For instance, the player stats dataset has asplayer_id
column, but the players dataset has ID variables with different names (esb_id
,gsis_id
,gsis_it_id
, andsmart_id
). Just based on the looks of it,player_id
in the player stats dataset appears equivalent togsis_id
in the players dataset, but I don't see documentation of that. It would be helpful if they had the same name (if they are equivalent). In addition, although the schedules dataset has agame_id
variable, the players stats dataset does not, which makes it much more challenging to merge.Having standard ID variables for players and games across datasets would make merging the datasets much easier. Thanks very much for your work on this great package!