nflverse / nflverse-data

Automated nflverse data repository
https://www.nflverse.com
Creative Commons Attribution 4.0 International
197 stars 18 forks source link

[BUG] <Roster file has bad ID for Jay Cutler in 2010 and no ID in 2011> #31

Closed greerreNFL closed 1 year ago

greerreNFL commented 1 year ago

Is there an existing issue for this?

Have you installed the latest development version of the package(s) in question?

What version of the package do you have?

na direct pull

Describe the bug

In the roster file, Jay Cutler's 2010 record uses Rashied Davies ID's. Jay Cutler does not have a record for 2011. Both Jay Cutler and Rashid Davies have inconsistent draft data (different records say they were drafted by different teams in different rounds)

Reprex

import pandas as pd
import numpy

## load roster files ##\
roster_url = 'https://github.com/nflverse/nflverse-data/releases/download/rosters'
rosters = []
for season in range(2006,2018):
    ## pull roster for that season ##
    temp = pd.read_csv(
        '{0}/roster_{1}.csv?raw=true'.format(
            roster_url,
            season
        ),
        low_memory=False
    )
    rosters.append(temp)

## combine rosters ##
r = pd.concat(rosters)

## can see that jay cutler has wrong ID 2010, is missing 2011, and strange draft data ##
r[
    r['full_name'] == 'Jay Cutler'
][[
    'season','team','full_name','gsis_id',
    'espn_id','pff_id','pfr_id','esb_id',
    'entry_year','draft_club','draft_number'
]]

## if you inspect the wrong ID, you see it's Rashied Davies ##
r[
    r['gsis_id'] == '00-0023429'
][[
    'season','team','full_name','gsis_id',
    'espn_id','pff_id','pfr_id','esb_id',
    'entry_year','draft_club','draft_number'
]]

## to confirm it's not an issue with concatination of DFs, you can see issue is at the file ##
## level ##
r2010 = pd.read_csv(
    '{0}/roster_{1}.csv?raw=true'.format(
        roster_url,
        2010
    )
)
r2010[
    r2010['full_name'] == 'Jay Cutler'
][[
    'season','team','full_name','gsis_id',
    'espn_id','pff_id','pfr_id','esb_id',
    'entry_year','draft_club','draft_number'
]]

Expected Behavior

Expected behavior is that the roster file would have consistent information for both players and not have missing seasons

nflverse_sitrep

na did in python

Screenshots

No response

Additional context

No response

john-b-edwards commented 1 year ago

This has been resolved:

nflreadr::load_rosters(2010:2011) |>
    dplyr::filter(full_name == "Jay Cutler") |>
    dplyr::select(season, week, full_name, gsis_id)
#> ── nflverse roster data ────────────────────────────────────────────────────────
#> ℹ Data updated: 2023-09-06 10:40:56 PDT
#> # A tibble: 2 × 4
#>   season  week full_name  gsis_id   
#>    <int> <int> <chr>      <chr>     
#> 1   2010    20 Jay Cutler 00-0024226
#> 2   2011    16 Jay Cutler 00-0024226