jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.28k stars 335 forks source link

Pandas Error Importing Statcast Data #348

Open jmaschino56 opened 1 year ago

jmaschino56 commented 1 year ago

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2

Stems from get_statcast_data_from_csv. Happens on 2022-07-17

BrayanMnz commented 1 year ago

This issue is not clear, could you give a more detailed description on how to get into it?

Thanks

ghost commented 1 year ago

It looks like I am getting the same issue when using the statcast api like so:

Example:

df = pyb.statcast(
        start_dt="2016-10-01",
        end_dt="2016-10-31",
    )

Stacktrace:

File "/var/task/pybaseball/statcast.py", line 113, in statcast
  return _handle_request(start_dt_date, end_dt_date, 1, verbose=verbose,
File "/var/task/pybaseball/statcast.py", line 76, in _handle_request
  dataframe_list.append(future.result())
File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 439, in result
  return self.__get_result()
File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
  raise self._exception
File "/var/lang/lib/python3.9/concurrent/futures/thread.py", line 58, in run
  result = self.fn(*self.args, **self.kwargs)
File "/var/task/pybaseball/cache/cache.py", line 58, in _cached
  result = func(*args, **kwargs)
File "/var/task/pybaseball/statcast.py", line 24, in _small_request
  data = statcast_ds.get_statcast_data_from_csv_url(
File "/var/task/pybaseball/cache/cache.py", line 58, in _cached
  result = func(*args, **kwargs)
File "/var/task/pybaseball/datasources/statcast.py", line 23, in get_statcast_data_from_csv_url
  return get_statcast_data_from_csv(
File "/var/task/pybaseball/datasources/statcast.py", line 35, in get_statcast_data_from_csv
  data = pd.read_csv(io.StringIO(csv_content))
File "/var/task/pandas/util/_decorators.py", line 211, in wrapper
  return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 331, in wrapper
  return func(*args, **kwargs)
File "/var/task/pandas/io/parsers/readers.py", line 950, in read_csv
  return _read(filepath_or_buffer, kwds)
File "/var/task/pandas/io/parsers/readers.py", line 611, in _read
  return parser.read(nrows)
File "/var/task/pandas/io/parsers/readers.py", line 1778, in read
  ) = self._engine.read(  # type: ignore[attr-defined]
File "/var/task/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
  chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
  chunk = self._read_rows(self.buffer_lines, 0)
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
  self._tokenize_rows(irows - buffered_lines)
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
  raise_parser_error('Error tokenizing data', self.parser)
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
  raise ParserError(message)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2

EDIT: I think the issue is the statcast api returning an error when trying to access the CSV.

TravisGibbs commented 1 year ago

I can't reproduce this issue, could someone who is hitting it run "pip list" in their env.

Mark-DelGrande commented 1 year ago

Encountering same issue:

Package Version


absl-py 1.4.0 alabaster 0.7.13 albumentations 1.2.1 altair 4.2.2 anyio 3.6.2 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 array-record 0.2.0 arviz 0.15.1 astropy 5.2.2 astunparse 1.6.3 attrs 23.1.0 audioread 3.0.0 autograd 1.5 Babel 2.12.1 backcall 0.2.0 beautifulsoup4 4.11.2 bleach 6.0.0 blis 0.7.9 blosc2 2.0.0 bokeh 2.4.3 branca 0.6.0 build 0.10.0 CacheControl 0.12.11 cached-property 1.5.2 cachetools 5.3.0 catalogue 2.0.8 certifi 2022.12.7 cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.12 chex 0.1.7 click 8.1.3 cloudpickle 2.2.1 cmake 3.25.2 cmdstanpy 1.1.0 colorcet 3.0.1 colorlover 0.3.0 community 1.0.0b1 confection 0.0.4 cons 0.4.5 contextlib2 0.6.0.post1 contourpy 1.0.7 convertdate 2.4.0 cryptography 40.0.2 cufflinks 0.17.3 cupy-cuda11x 11.0.0 cvxopt 1.3.0 cvxpy 1.3.1 cycler 0.11.0 cymem 2.0.7 Cython 0.29.34 dask 2022.12.1 datascience 0.17.6 db-dtypes 1.1.1 dbus-python 1.2.16 debugpy 1.6.6 decorator 4.4.2 defusedxml 0.7.1 Deprecated 1.2.14 distributed 2022.12.1 dlib 19.24.1 dm-tree 0.1.8 docutils 0.16 dopamine-rl 4.0.6 duckdb 0.7.1 earthengine-api 0.1.350 easydict 1.10 ecos 2.0.12 editdistance 0.6.2 en-core-web-sm 3.5.0 entrypoints 0.4 ephem 4.1.4 et-xmlfile 1.1.0 etils 1.2.0 etuples 0.3.8 exceptiongroup 1.1.1 fastai 2.7.12 fastcore 1.5.29 fastdownload 0.0.7 fastjsonschema 2.16.3 fastprogress 1.0.3 fastrlock 0.8.1 filelock 3.12.0 firebase-admin 5.3.0 Flask 2.2.4 flatbuffers 23.3.3 flax 0.6.9 folium 0.14.0 fonttools 4.39.3 frozendict 2.3.7 fsspec 2023.4.0 future 0.18.3 gast 0.4.0 GDAL 3.3.2 gdown 4.6.6 gensim 4.3.1 geographiclib 2.0 geopy 2.3.0 gin-config 0.5.0 glob2 0.7 google 2.0.3 google-api-core 2.11.0 google-api-python-client 2.84.0 google-auth 2.17.3 google-auth-httplib2 0.1.0 google-auth-oauthlib 1.0.0 google-cloud-bigquery 3.9.0 google-cloud-bigquery-storage 2.19.1 google-cloud-core 2.3.2 google-cloud-datastore 2.15.1 google-cloud-firestore 2.11.0 google-cloud-language 2.9.1 google-cloud-storage 2.8.0 google-cloud-translate 3.11.1 google-colab 1.0.0 google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.5.0 googleapis-common-protos 1.59.0 googledrivedownloader 0.4 graphviz 0.20.1 greenlet 2.0.2 grpcio 1.54.0 grpcio-status 1.48.2 gspread 3.4.2 gspread-dataframe 3.0.8 gym 0.25.2 gym-notices 0.0.8 h5netcdf 1.1.0 h5py 3.8.0 holidays 0.25 holoviews 1.15.4 html5lib 1.1 httpimport 1.3.0 httplib2 0.21.0 huggingface-hub 0.15.1 humanize 4.6.0 hyperopt 0.2.7 idna 3.4 imageio 2.25.1 imageio-ffmpeg 0.4.8 imagesize 1.4.1 imbalanced-learn 0.10.1 imgaug 0.4.0 importlib-resources 5.12.0 imutils 0.5.4 inflect 6.0.4 iniconfig 2.0.0 intel-openmp 2023.1.0 ipykernel 5.5.6 ipython 7.34.0 ipython-genutils 0.2.0 ipython-sql 0.4.1 ipywidgets 7.7.1 itsdangerous 2.1.2 jax 0.4.10 jaxlib 0.4.10+cuda11.cudnn86 jieba 0.42.1 Jinja2 3.1.2 joblib 1.2.0 jsonpickle 3.0.1 jsonschema 4.3.3 jupyter-client 6.1.12 jupyter-console 6.1.0 jupyter_core 5.3.0 jupyter-server 1.24.0 jupyterlab-pygments 0.2.2 jupyterlab-widgets 3.0.7 kaggle 1.5.13 keras 2.12.0 kiwisolver 1.4.4 korean-lunar-calendar 0.3.1 langcodes 3.3.0 lazy_loader 0.2 libclang 16.0.0 librosa 0.10.0.post2 lightgbm 3.3.5 lit 16.0.5 llvmlite 0.39.1 locket 1.0.0 logical-unification 0.4.5 LunarCalendar 0.0.9 lxml 4.9.2 Markdown 3.4.3 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 matplotlib-venn 0.11.9 mdurl 0.1.2 miniKanren 1.0.3 missingno 0.5.2 mistune 0.8.4 mizani 0.8.1 mkl 2019.0 ml-dtypes 0.1.0 mlxtend 0.14.0 more-itertools 9.1.0 moviepy 1.0.3 mpmath 1.3.0 msgpack 1.0.5 multipledispatch 0.6.0 multitasking 0.0.11 murmurhash 1.0.9 music21 8.1.0 natsort 8.3.1 nbclient 0.7.4 nbconvert 6.5.4 nbformat 5.8.0 nest-asyncio 1.5.6 networkx 3.1 nibabel 3.0.2 nltk 3.8.1 notebook 6.4.8 numba 0.56.4 numexpr 2.8.4 numpy 1.22.4 oauth2client 4.1.3 oauthlib 3.2.2 opencv-contrib-python 4.7.0.72 opencv-python 4.7.0.72 opencv-python-headless 4.7.0.72 openpyxl 3.0.10 opt-einsum 3.3.0 optax 0.1.5 orbax-checkpoint 0.2.1 osqp 0.6.2.post8 packaging 23.1 palettable 3.3.3 pandas 1.5.3 pandas-datareader 0.10.0 pandas-gbq 0.17.9 pandocfilters 1.5.0 panel 0.14.4 param 1.13.0 parso 0.8.3 partd 1.4.0 pathlib 1.0.1 pathy 0.10.1 patsy 0.5.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.4.0 pip 23.1.2 pip-tools 6.13.0 platformdirs 3.3.0 plotly 5.13.1 plotnine 0.10.1 pluggy 1.0.0 polars 0.17.3 pooch 1.6.0 portpicker 1.3.9 prefetch-generator 1.0.3 preshed 3.0.8 prettytable 0.7.2 proglog 0.1.10 progressbar2 4.2.0 prometheus-client 0.16.0 promise 2.3 prompt-toolkit 3.0.38 prophet 1.1.3 proto-plus 1.22.2 protobuf 3.20.3 psutil 5.9.5 psycopg2 2.9.6 ptyprocess 0.7.0 py-cpuinfo 9.0.0 py4j 0.10.9.7 pyarrow 9.0.0 pyasn1 0.5.0 pyasn1-modules 0.3.0 pybaseball 2.2.5 pycocotools 2.0.6 pycparser 2.21 pyct 0.5.0 pydantic 1.10.7 pydata-google-auth 1.7.0 pydot 1.4.2 pydot-ng 2.0.0 pydotplus 2.0.2 PyDrive 1.3.1 pyerfa 2.0.0.3 pygame 2.3.0 PyGithub 1.58.2 Pygments 2.14.0 PyGObject 3.36.0 PyJWT 2.7.0 pymc 5.1.2 PyMeeus 0.5.12 pymystem3 0.2.0 PyNaCl 1.5.0 PyOpenGL 3.1.6 pyparsing 3.0.9 pyproject_hooks 1.0.0 pyrsistent 0.19.3 PySocks 1.7.1 pytensor 2.10.1 pytest 7.2.2 python-apt 0.0.0 python-dateutil 2.8.2 python-louvain 0.16 python-slugify 8.0.1 python-utils 3.5.2 pytz 2022.7.1 pytz-deprecation-shim 0.1.0.post0 pyviz-comms 2.2.1 PyWavelets 1.4.1 PyYAML 6.0 pyzmq 23.2.1 qdldl 0.1.7 qudida 0.0.4 regex 2022.10.31 requests 2.27.1 requests-oauthlib 1.3.1 requests-unixsocket 0.2.0 requirements-parser 0.5.0 rich 13.3.4 rpy2 3.5.5 rsa 4.9 safetensors 0.3.1 scikit-image 0.19.3 scikit-learn 1.2.2 scipy 1.10.1 scs 3.2.3 seaborn 0.12.2 Send2Trash 1.8.0 setuptools 67.7.2 shapely 2.0.1 six 1.16.0 sklearn-pandas 2.2.0 smart-open 6.3.0 sniffio 1.3.0 snowballstemmer 2.2.0 sortedcontainers 2.4.0 soundfile 0.12.1 soupsieve 2.4.1 soxr 0.3.5 spacy 3.5.2 spacy-legacy 3.0.12 spacy-loggers 1.0.4 Sphinx 3.5.4 sphinxcontrib-applehelp 1.0.4 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 SQLAlchemy 2.0.10 sqlparse 0.4.4 srsly 2.4.6 statsmodels 0.13.5 sympy 1.11.1 tables 3.8.0 tabulate 0.8.10 tblib 1.7.0 tenacity 8.2.2 tensorboard 2.12.2 tensorboard-data-server 0.7.0 tensorboard-plugin-wit 1.8.1 tensorflow 2.12.0 tensorflow-datasets 4.9.2 tensorflow-estimator 2.12.0 tensorflow-gcs-config 2.12.0 tensorflow-hub 0.13.0 tensorflow-io-gcs-filesystem 0.32.0 tensorflow-metadata 1.13.1 tensorflow-probability 0.20.1 tensorstore 0.1.36 termcolor 2.3.0 terminado 0.17.1 text-unidecode 1.3 textblob 0.17.1 tf-slim 1.1.0 thinc 8.1.9 threadpoolctl 3.1.0 tifffile 2023.4.12 tinycss2 1.2.1 tokenizers 0.13.3 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 2.0.1+cu118 torchaudio 2.0.2+cu118 torchdata 0.6.1 torchsummary 1.5.1 torchtext 0.15.2 torchvision 0.15.2+cu118 tornado 6.3.1 tqdm 4.65.0 traitlets 5.7.1 transformers 4.30.2 triton 2.0.0 tweepy 4.13.0 typer 0.7.0 types-setuptools 67.8.0.0 typing_extensions 4.5.0 tzdata 2023.3 tzlocal 4.3 uritemplate 4.1.1 urllib3 1.26.15 vega-datasets 0.9.0 wasabi 1.1.1 wcwidth 0.2.6 webcolors 1.13 webencodings 0.5.1 websocket-client 1.5.1 Werkzeug 2.3.0 wheel 0.40.0 widgetsnbextension 3.6.4 wordcloud 1.8.2.2 wrapt 1.14.1 xarray 2022.12.0 xarray-einstats 0.5.1 xgboost 1.7.5 xlrd 2.0.1 yellowbrick 1.5 yfinance 0.2.18 zict 3.0.0 zipp 3.15.0