jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.18k stars 321 forks source link

Convert dtypes issue? #427

Closed EddietheProgrammer closed 3 weeks ago

EddietheProgrammer commented 3 weeks ago

I am loading savant data for the Cleveland Guardians, but for some reason I get a value error.

pybaseball.statcast('2024-03-01', '2024-06-07', team = 'CLE')

Outputs:

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 125 and the array at index 1 has size 135
     83 # Concatenate all dataframes into final result set
     84 if dataframe_list:
---> 85     final_data = pd.concat(dataframe_list, axis=0).convert_dtypes(convert_string=False)
     86     final_data = final_data.sort_values(
     87         ['game_date', 'game_pk', 'at_bat_number', 'pitch_number'],
     88         ascending=False

I've tried it with other teams and they don't produce value errors. I've even loaded it for all teams and it works. I am not sure if this may be something with convert_dtypes as when I checked the issue, I found for the Guardians specifically their game on June 6th converted the datetime to datetime64[ns] instead of datetime64[us]. I'll update if this goes away.

bdilday commented 3 weeks ago

I tried this command and it worked fine for me. Could you check if maybe you have a cached file that is somehow causing the error? It'd be in $HOME/.pybaseball/cache/

EddietheProgrammer commented 3 weeks ago

Yep. Clearing cache worked. Did something funky happen with the data type conversion when the split_request (under utils.py) function was breaking up the query which led to a faulty cache?

Thank you.