jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.18k stars 323 forks source link

Player ID lookup throws dtype warnings #319

Closed tjburch closed 1 year ago

tjburch commented 1 year ago

playerid_lookup is throwing a bunch of dtype errors:

>>> from pybaseball import playerid_lookup
>>> playerid_lookup('jones')
Gathering player lookup table. This may take a moment.
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (8,9,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (8,9) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
/Users/tburch/Documents/github/pybaseball/pybaseball/playerid_lookup.py:33: DtypeWarning: Columns (9,10) have mixed types. Specify dtype option on import or set low_memory=False.
  lambda zip_info: pd.read_csv(io.BytesIO(zip_archive.read(zip_info.filename))),
    name_last name_first  key_mlbam key_retro  key_bbref  key_fangraphs  mlb_played_first  mlb_played_last
0       jones         a.         -1       NaN   jonesa03             -1            1926.0           1926.0
1       jones      david         -1  joned108  jonesda06             -1            1882.0           1882.0
2       jones       jake     116696  jonej106  jonesja03        1006565            1941.0           1948.0
3       jones       doug     116682  joned001  jonesdo01        1006552            1982.0           2000.0
4       jones        red     116709  joner103  jonesre01        1006583            1940.0           1940.0
..        ...        ...        ...       ...        ...            ...               ...              ...
136     jones       jack     116677  jonej107  jonesja02        1006564            1883.0           1883.0
137     jones        NaN         -1       NaN    jones10             -1            1937.0           1937.0
138     jones      casey         -1       NaN  jonesca02             -1            1934.0           1934.0
139     jones      henry     116691  joneh102  joneshe02        1006561            1890.0           1890.0
140     jones      percy     116714  jonep101  jonespe01        1006581            1920.0           1930.0

[141 rows x 8 columns]

It's probably fine to just toggle the low_memory flag as indicated, not like these are big payloads.

hengoren commented 1 year ago

Made low_memory=False flag patch locally and that seems to get rid of the dtype warnings in my scripts. Would be happy to open a pull request, but this is my first time contributing. Are there some permissions I need to push a branch/open a pull request?

tjburch commented 1 year ago

Thanks @hengoren. See https://github.com/jldbc/pybaseball/blob/master/contributing.md, you'll need to make a fork, make changes there, then make a PR from that fork.

tjburch commented 1 year ago

Covered in commit fae546694c5eb13e3604f3e0fd2faa09351c993d