Open Haman-Karn opened 1 year ago
I have tested the stats = statcast(start_dt="2023-06-25")
code on both Colab (python 3.10.12) and my local environment (3.11.2) and they worked fine.
I guess maybe something went wrong in concurrent mode according to the error message dataframe_list.append(future.result())
Maybe try to turn off the parallel
will work?
stats = statcast(start_dt="2023-06-25", parallel=False)
I discovered the issue -- there must have been something corrupted in the cache. Disabling the cache fixed the problem. But attempting to purge the cache also results in an error.
Traceback (most recent call last):
File "c:\Users\nosoa\Documents\glb\getstats.py", line 5, in <module>
pybaseball.cache.purge()
File "c:\Users\nosoa\Documents\glb\venv\Lib\site-packages\pybaseball\cache\cache.py", line 31, in purge
records = [cache_record.CacheRecord(filename) for filename in record_files]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\nosoa\Documents\glb\venv\Lib\site-packages\pybaseball\cache\cache.py", line 31, in <listcomp>
records = [cache_record.CacheRecord(filename) for filename in record_files]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\nosoa\Documents\glb\venv\Lib\site-packages\pybaseball\cache\cache_record.py", line 23, in __init__
self.data = cast(Dict[str, Any], file_utils.load_json(filename))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\nosoa\Documents\glb\venv\Lib\site-packages\pybaseball\cache\file_utils.py", line 28, in load_json
return cast(JSONData, json.load(json_file))
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nosoa\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "C:\Users\nosoa\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nosoa\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nosoa\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 37 (char 36)
I found that some cache file not save completely that cause cache.purge()
cannot parse them.
In my case, file name with prefix _small_request
all only contain
{"func": "_small_request", "args": [
Because it is not valid json so it will raise decode error.
You can find the cache files from /Users/{user_name}/.pybaseball/cache
or in colab /root/.pybaseball/cache
IMO, currently we can only delete those invalid cache file manually since they also do not contain expire
time
Should be fixed in #438
While getting all of the statcast data, I kept getting an error around 98%. So I eventually was able to narrow it down to 2023-06-25 being the first problematic one day. Other day(s) past this one also cause the error, but I've stopped at 06-25 because this amount of data is good enough for my current purposes.
The code I'm executing is this:
stats = statcast(start_dt="2023-06-25")
Upon execution, my terminal looks like this: