jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.18k stars 323 forks source link

ValueError: Usecols do not match columns, columns expected but not found: ['mlb_played_first', 'name_last', 'name_first', 'key_bbref', 'mlb_played_last', 'key_fangraphs', 'key_mlbam', 'key_retro'] #308

Closed TobiasCortese closed 1 year ago

TobiasCortese commented 1 year ago

playerid_lookup() and playerid_reverse_lookup() are both generating the error in the title. both functions operated as expected yesterday.

wondering if anyone else is experiencing these issues?

simply executed the sample code to create the errors. specifically the following...

from pybaseball import playerid_reverse_lookup
# a list of mlbam ids
player_ids = [116539, 116541, 641728, 116540]
# find the names of the players in player_ids, along with their ids from other data sources
data = pybaseball.playerid_reverse_lookup(player_ids, key_type='mlbam')
Gathering player lookup table. This may take a moment.
ValueError: Usecols do not match columns, columns expected but not found: ['mlb_played_first', 'name_last', 'name_first', 'key_bbref', 'mlb_played_last', 'key_fangraphs', 'key_mlbam', 'key_retro']
Command took 0.16 seconds -- by tobiasc@slalom.com at 1/22/2023, 4:34:09 PM on SandboxML 11.1
from pybaseball import playerid_lookup
# find the ids of all players with last name Jones (returns 1,314 rows)
data = playerid_lookup('jones')
Gathering player lookup table. This may take a moment.
ValueError: Usecols do not match columns, columns expected but not found: ['mlb_played_first', 'name_last', 'name_first', 'key_bbref', 'mlb_played_last', 'key_fangraphs', 'key_mlbam', 'key_retro']
Command took 0.25 seconds -- by tobiasc@slalom.com at 1/22/2023, 4:33:43 PM on SandboxML 11.1
ccott235 commented 1 year ago

I am also experiencing the same issue.

DannyPhant8m commented 1 year ago

Agreed, same issue arises for me. Any ideas on alternative way to receive player key_mlbam?

TobiasCortese commented 1 year ago

@DannyPhant8m - I'm pulling key_mlbam from the statcast (batter/pitcher) and baseball reference (mlbID) data

DannyPhant8m commented 1 year ago

@DannyPhant8m - I'm pulling key_mlbam from the statcast (batter/pitcher) and baseball reference (mlbID) data

The statcast function provides both batter and pitcher key_mlbam, but the player_name column only tells you the pitcher. How do you know which batter correlates to the batter key_mlbam?

TobiasCortese commented 1 year ago

getting the same error with the chadwick_register. guessing it's all related.

from pybaseball import chadwick_register --get the register data and save to disk chadwick_data = chadwick_register(save=True)

moon0331 commented 1 year ago

https://github.com/chadwickbureau/register

It seems that reference file is seperated. people.csv, referenced by chadwick_register(), was split into 16 files, people-[0-f].csv .

samlafell commented 1 year ago

https://github.com/chadwickbureau/register

It seems that reference file is seperated. people.csv, referenced by chadwick_register(), was split into 16 files, people-[0-f].csv .

Exactly that. The PyBaseball team will need time to reflect changes in their function more than likely. Until then you'll need to write your own to reflect the change that's been made upstream from chadwickbureau.

TobiasCortese commented 1 year ago

@DannyPhant8m - I'm pulling key_mlbam from the statcast (batter/pitcher) and baseball reference (mlbID) data

The statcast function provides both batter and pitcher key_mlbam, but the player_name column only tells you the pitcher. How do you know which batter correlates to the batter key_mlbam?

@DannyPhant8m - I'm pulling key_mlbam from the statcast (batter/pitcher) and baseball reference (mlbID) data

The statcast function provides both batter and pitcher key_mlbam, but the player_name column only tells you the pitcher. How do you know which batter correlates to the batter key_mlbam?

@DannyPhant8m - I'm using bwar_pitch() to hack a join table

ccott235 commented 1 year ago

I've worked up a temporary fix that is working for me. Simply adjust the _playerlookup.py file in the pybaseball folder as follows:

1. Change url on line 12 to:

image

2. Change lines 30-33 to:

image
tjburch commented 1 year ago

Confirming this works after #309:

>>> data = playerid_reverse_lookup(player_ids, key_type='mlbam')
>>> data
  name_last name_first  key_mlbam key_retro  key_bbref  key_fangraphs  mlb_played_first  mlb_played_last
0     jeter      shawn     116541  jetes001  jetersh01        1006406            1992.0           1992.0
1     jeter      derek     116539  jeted001  jeterde01            826            1995.0           2014.0
2     jeter     johnny     116540  jetej101  jeterjo01        1006405            1969.0           1974.0

The forward lookup throws a bunch of warnings now but I'll open a new issue for that.

DannyPhant8m commented 1 year ago

@tjburch, your above comment still does not work for me at the moment

tjburch commented 1 year ago

There hasn't been a new release yet so it's not on PyPi. Install the library directly from git, not pip.