nflverse / nflverse-data

Automated nflverse data repository
https://www.nflverse.com
Creative Commons Attribution 4.0 International
139 stars 12 forks source link

Incorrect old_game_id in participation #6

Closed Adeiko closed 1 year ago

Adeiko commented 1 year ago

Hi!

When looking at the load_participation data, it seems the week 15 DAL vs NYG game doesn't share the same old_game_id with the pbp data (it's the number 14 in the lists) "2021121903" vs "2021121907"

nflreadr::load_participation(season=2021) %>% filter(possession_team =="DAL") %>% select (old_game_id) %>% distinct()
── nflverse pbp participation ──
ℹ Data updated: 2022-08-03 02:38:31 CEST
# A tibble: 18 × 1
   old_game_id
   <chr>      
 1 2021090900 
 2 2021091911 
...
13 2021121207 
14 2021121903
15 2021122611
...
> nflfastR::load_pbp(season=2021) %>% filter(posteam=="DAL")%>%select(old_game_id)%>%distinct()
── nflverse play by play  ──
ℹ Data updated: 2022-07-29 00:10:55 CEST
# A tibble: 18 × 1
   old_game_id
   <chr>      
 1 2021090900 
 2 2021091911 
...
13 2021121207 
14 2021121907
15 2021122611 
...

The data seems to be correct just the wrong old_game_id, In the load_schedules it shows the same old_game_id as in the pbp.

nflreadr::load_schedules(season=2021) %>% filter(away_team =="DAL",week==15) %>% select (old_game_id)
── nflverse games and schedules  ──
ℹ Data updated: 2022-08-08 18:51:11 CEST
# A tibble: 1 × 1
  old_game_id
  <chr>      
1 2021121907 
tanho63 commented 1 year ago

oof, looks like a rescheduling shenanigans thing? cc @john-b-edwards

Kyber84 commented 1 year ago

Hi!! Probably in these games there is some problems with old_game_id when joined pbp with partecipation. 2021_15_ARI_DET 2021_15_HOU_JAX 2021_15_NYJ_MIA 2021_15_CAR_BUF 2021_15_TEN_PIT 2021_15_CIN_DEN

numbersinfigures commented 1 year ago

Just in case it'd be helpful to put it out there, I'm pretty sure that along with 2021_15_DAL_NYG and Kyber84's list, 2021_15_ATL_SF 2021_15_NO_TB also have different old_game_id in the participation dataset compared to the newest schedule and pbp databases.

The id mismatches understandably seem to be affecting the "include_pbp=TRUE" option, because that omits 1560 plays that were in the non-pbp linked set, with >90% of those being 2021 week 15 games. And, for some of the plays in the include_pbp set, you see rows where possession_team is neither the home_team nor away_team.

Also, I was using an older pbp dataset from a couple months ago, and there are more games from other weeks that are desynched. Guess there was a recent old_game_id update that I didn't catch, and others might need to keep an eye out for it as well.

Thanks for the hard work put into making this all available!

Adeiko commented 1 year ago

For reference, this is the game_ids for week15 2021 according to pbp and the ones according to participation. I matched them checking what team did the players in the participation belong.

game_id,old_game_id_pbp,old_game_id_participation
2021_15_ARI_DET,2021121903,2021121901
2021_15_ATL_SF,2021121911,2021121906
2021_15_CAR_BUF,2021121901,2021121909
2021_15_CIN_DEN,2021121910,2021121905
2021_15_DAL_NYG,2021121907,2021121903
2021_15_HOU_JAX,2021121905,2021121902
2021_15_NO_TB,2021121913,2021121908
2021_15_NYJ_MIA,2021121906,2021121910
2021_15_TEN_PIT,2021121909,2021121904

2021_15_GB_BAL,2021121900,2021121900
2021_15_KC_LAC,2021121600,2021121600
2021_15_LV_CLE,2021122001,2021122001
2021_15_MIN_CHI,2021122000,2021122000
2021_15_NE_IND,2021121801,2021121801
2021_15_SEA_LA,2021122101,2021122101
2021_15_WAS_PHI,2021122100,2021122100
tanho63 commented 1 year ago

Should be resolved in nflverse/nflreadr#144