nflverse / nfl_data_py

Python code for working with NFL play by play data.
MIT License
252 stars 48 forks source link

Missing 2022 Rookie Data in Roster Import #49

Closed stranger9977 closed 1 year ago

stranger9977 commented 1 year ago

Hello,

I've been using the nfl_data_py package for a project and have encountered an issue with the 2022 rookie data. Specifically, when I use the import_rosters() function to import rosters for the 2022 season, I've noticed that some rookies, such as Drake London, are missing from the dataset.

Here's the code I've been using:

!pip install nfl_data_py==0.3.0
import nfl_data_py as nfl

# Import player stats from 2020 to 2022
stats = nfl.import_seasonal_data([2020,2021,2022])
stats['season'] = stats['season'] +1

# Import the roster data for 2021 and 2022
seasons = np.arange(2021,2023).tolist()
for year in seasons:
    player_data = nfl.import_rosters([year])

# Merge the stats and player data
nflpy = pd.merge(stats, player_data, how='inner', on=['season','player_id'],suffixes = (None, '_y'))

# Check for Drake London
drake_london = nflpy.query('name == "Drake London"')
print(drake_london)

When I run this code, the drake_london DataFrame is empty, indicating that Drake London's data is not in the 2022 roster data.

I've checked the package documentation and haven't found any information about this issue. I'm using the latest version of the package.

Could you please look into this issue? Is there something I'm missing, or is the 2022 rookie data not yet available in the package's data repository?

Thank you for your help.

alecglen commented 1 year ago

Hi @stranger9977, it appears the data is present in the repo as intended.

In: pd.merge(
    ...:     nfl.import_rosters([2022]),
    ...:     nfl.import_seasonal_data([2022]),
    ...:     on=['season', 'player_id']
    ...: ).query('player_name == "Drake London"')

Out: 
    season team position depth_chart_position  jersey_number  status   player_name first_name last_name birth_date  height  weight college  ... games    tgt_sh     ay_sh    yac_sh    wopr_y     ry_sh    rtd_sh    rfd_sh  rtdfd_sh      dom     w8dom    yptmpa    ppr_sh
40    2022  ATL       WR                   WR            5.0  Active  Drake London      Drake    London 2001-07-24    76.0   210.0    None  ...    17  0.281928  0.300521  0.187958  0.663308  0.295866  0.235294  0.324324  0.315152  0.26558  0.283752  2.086747  0.149509

[1 rows x 92 columns]

The problem is in your line stats['season'] = stats['season'] +1. I'm not sure what your intention was exactly, but doing that sets London's one year of seasonal_data as if it happened in 2023 instead of 2022. Since there is no 2023 roster data published yet, your pd.merge with how='inner' filters London out since there are no records that align on season.

Hope this helps! Feel free to ask any follow-up questions.