nflverse / nfl_data_py

Python code for working with NFL play by play data.
MIT License
252 stars 48 forks source link

import_seasonal_data() concatenating column values #45

Closed tbryan2 closed 1 year ago

tbryan2 commented 1 year ago

When I run the import_seasonal_data() column it duplicates column names:

nfl_df = nfl.import_seasonal_data([2022]) nfl_df['player_name'].value_counts()

player_name T.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.BradyT.Brady 1

Perhaps some kind of summation/concatenation to get to the season level is causing this duplication?

alecglen commented 1 year ago

Hey @tbryan2! Could you please update your example? I'm not quite following because player_name is not a column in the seasonal_data.

tbryan2 commented 1 year ago

Hey @alecglen thanks for responding! From the init.py file I can see that the import_seasonal_data() is meant to import seasonal player stats.

Functions
---------
import_pbp_data() - import play-by-play data
import_weekly_data() - import weekly player stats
import_seasonal_data() - import seasonal player stats

Here's the columns I get after running this function:

Index(['player_id', 'season', 'season_type', 'player_name',
       'player_display_name', 'position', 'position_group', 'headshot_url',
       'completions', 'attempts', 'passing_yards', 'passing_tds',
       'interceptions', 'sacks', 'sack_yards', 'sack_fumbles',
       'sack_fumbles_lost', 'passing_air_yards', 'passing_yards_after_catch',
       'passing_first_downs', 'passing_epa', 'passing_2pt_conversions', 'pacr',
       'dakota', 'carries', 'rushing_yards', 'rushing_tds', 'rushing_fumbles',
       'rushing_fumbles_lost', 'rushing_first_downs', 'rushing_epa',
       'rushing_2pt_conversions', 'receptions', 'targets', 'receiving_yards',
       'receiving_tds', 'receiving_fumbles', 'receiving_fumbles_lost',
       'receiving_air_yards', 'receiving_yards_after_catch',
       'receiving_first_downs', 'receiving_epa', 'receiving_2pt_conversions',
       'racr', 'target_share', 'air_yards_share', 'wopr_x',
       'special_teams_tds', 'fantasy_points', 'fantasy_points_ppr', 'games',
       'tgt_sh', 'ay_sh', 'yac_sh', 'wopr_y', 'ry_sh', 'rtd_sh', 'rfd_sh',
       'rtdfd_sh', 'dom', 'w8dom', 'yptmpa', 'ppr_sh'],
      dtype='object')
alecglen commented 1 year ago

~Interesting - that's not the columns I got with a fresh install just now. Can you share your Python version and the output of pip freeze?~

I stand corrected, I had a cached pandas package in my environment. Evidently there is a difference when running with pandas 2.0 that will need to be investigated.

@tbryan2 as a temporary workaround, you can revert your environment's pandas version to 1.5.3 to get it working as expected. Thank you for the report!

alecglen commented 1 year ago

Fixed in #44.