nflverse / nflreadr

Efficiently download nflverse data
https://nflreadr.nflverse.com/
Other
57 stars 12 forks source link

Data documentation #28

Open tanho63 opened 3 years ago

tanho63 commented 3 years ago

It would be great if we had consistent data documentation available within the package. At the moment, the following data functions are missing in-package dictionaries:

missing fields as per comments below:

tanho63 commented 2 years ago

Data docs is an ongoing battle! Along with dictionaries missing above, here are some missing fields for existing dictionaries:

R> load_schedules() |>
+++   dict_check(dictionary_schedules)
    old                | new                         
                       - "alt_game_id" [1]           
[1] "away_coach"       | "away_coach"  [2]           
[2] "away_moneyline"   -                             
[3] "away_qb_id"       -                             
[4] "away_qb_name"     -                             
[5] "away_rest"        -                             
[6] "away_score"       | "away_score"  [3]           
[7] "away_spread_odds" -                             
[8] "away_team"        | "away_team"   [4]           
[9] "div_game"         -                             
... ...                  ...           and 3 more ...

     old                | new                         
[14] "gametime"         | "gametime"   [9]            
[15] "gsis"             | "gsis"       [10]           
[16] "home_coach"       | "home_coach" [11]           
[17] "home_moneyline"   -                             
[18] "home_qb_id"       -                             
[19] "home_qb_name"     -                             
[20] "home_rest"        -                             
[21] "home_score"       | "home_score" [12]           
[22] "home_spread_odds" -                             
[23] "home_team"        | "home_team"  [13]           
 ... ...                  ...          and 10 more ...

     old           | new                         
[34] "season"      | "season"      [18]          
[35] "spread_line" | "spread_line" [19]          
[36] "stadium"     | "stadium"     [20]          
[37] "stadium_id"  -                             
[38] "surface"     | "surface"     [21]          
[39] "temp"        | "temp"        [22]          
[40] "total"       | "total"       [23]          
[41] "total_line"  | "total_line"  [24]          
[42] "under_odds"  -                             
[43] "week"        | "week"        [25]          
 ... ...             ...           and 2 more ...
R> load_snap_counts() |>
+++   dict_check(dictionary_snap_counts)
     old             | new                           
 [1] "defense_pct"   | "defense_pct"   [1]           
 [2] "defense_snaps" | "defense_snaps" [2]           
 [3] "game_id"       | "game_id"       [3]           
 [4] "game_type"     -                               
 [5] "offense_pct"   | "offense_pct"   [4]           
 [6] "offense_snaps" | "offense_snaps" [5]           
 [7] "opponent"      -                               
 [8] "pfr_game_id"   | "pfr_game_id"   [6]           
 [9] "pfr_player_id" | "pfr_player_id" [7]           
[10] "player"        | "player"        [8]           
 ... ...               ...             and 6 more ...

old here refers to the actual dataframe, while new here refers to the current dictionary. We want to make sure that the dictionary (new) matches the data (old).

To rerun these checks, consult data-raw/dictionary_check.R

mpcen commented 1 year ago

Not listed but https://github.com/nflverse/nflreadr/pull/192 takes care of playerstats_def