nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
426 stars 52 forks source link

GSIS ID appears to be incorrect in fast_scraper_roster() #139

Closed ajreinhard closed 3 years ago

ajreinhard commented 3 years ago

I do a series of joins with PFF data every week and I encountered a couple issues that I did not catch last week. After some investigation, I found that Tyler Conklin and Ryan Izzo have swapped gsis_id in 2020 and Christian Jones (LB, DET from Florida State) has been assigned the same gsis_id as Chris Jones (DB, ARI from Nebraska).

I've added some code that compares the gsis_id from 2019 to 2020 if that helps.

library(nflfastR)
library(tidyverse)

roster_df <- fast_scraper_roster(2019:2020)

roster_df %>% 
  filter(gsis_id %in% c('00-0034270','00-0034439','00-0034641')) %>% 
  arrange(gsis_id)
mrcaseb commented 3 years ago

The swapped Izzo and Conklin IDs is a known issue of the underlying data that I hope will get fixed at some point.

https://github.com/mrcaseb/nflfastR-roster/issues/3

The duplicated ID of Christian Jones is new to me. Will have a look, thanks!

mrcaseb commented 3 years ago

The "Chris Jones" / "Christian Jones" is another bug in the underlying data. I will add this to the referenced issue in the roster repo and close this one so we don't have to issues tracking the same problem