jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.19k stars 324 forks source link

Sanitize whitespace in column names from statcast calls #280

Closed tjburch closed 1 year ago

tjburch commented 1 year ago

This addresses #279, in which there's a whitespace in first_name. The issue was flagged for statcast_batter_exitvelo_barrels, however it permeated to most the statcast calls. I implemented an effectively one-line solution in utils that strips any whitespace from the column names of the statcast calls, and imported the new function and used where needed.

tjburch commented 1 year ago

Hm. Seems to be failing the test test_statcast_pitcher_exitvelo_barrels (and similar tests) on:

assert len(result.columns) == 19

Seems to be getting 18 columns now.

I just tried commenting out my changes to see if I could get back the 19 and still managed to get 18. Can do some more digging into this later but if anyone has an idea where the 19 came from, that'd be helpful. For reference I see:

['last_name', 'first_name', 'player_id', 'attempts', 'avg_hit_angle',
       'anglesweetspotpercent', 'max_hit_speed', 'avg_hit_speed', 'fbld', 'gb',
       'max_distance', 'avg_distance', 'avg_hr_distance', 'ev95plus',
       'ev95percent', 'barrels', 'brl_percent', 'brl_pa']
tjburch commented 1 year ago

Ok, so it seems like even in the last version the Statcast calls only return 18 columns:

>>> from pybaseball.statcast_pitcher import statcast_pitcher_exitvelo_barrels
>>> from importlib.metadata import version
>>> version('pybaseball')
'2.2.1'
>>> df = statcast_pitcher_exitvelo_barrels('2020')
>>> len(df.columns)
18
>>> df.columns
Index(['last_name', ' first_name', 'player_id', 'attempts', 'avg_hit_angle',
       'anglesweetspotpercent', 'max_hit_speed', 'avg_hit_speed', 'fbld', 'gb',
       'max_distance', 'avg_distance', 'avg_hr_distance', 'ev95plus',
       'ev95percent', 'barrels', 'brl_percent', 'brl_pa'],
      dtype='object')

So I'm just going to update the tests and move on with life