laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
464 stars 105 forks source link

pyathena hijacks pandas s3fs #476

Closed graydenshand closed 1 year ago

graydenshand commented 1 year ago

Using pyathena's results-to-pandas conversion feature breaks pandas to_csv function.

Minimum reproducible example:

>>> from pyathena.pandas.util import as_pandas
>>> import pandas as pd
>>> df = pd.DataFrame({"foo": [1,2,3]})
>>> df.to_csv("s3://...")
Traceback (most recent call last):
  File "/Users/gshand/code/adhoc/pandas_pyathena/test.py", line 5, in <module>
    df.to_csv("s3://...")
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pandas/core/generic.py", line 3902, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1152, in to_csv
    csv_formatter.save()
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 247, in save
    with get_handle(
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pandas/io/common.py", line 142, in __exit__
    self.close()
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pandas/io/common.py", line 134, in close
    handle.close()
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pyathena/filesystem/s3.py", line 520, in close
    super(S3File, self).close()
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1944, in close
    self.flush(force=True)
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/fsspec/spec.py", line 1810, in flush
    self._initiate_upload()
  File "/Users/gshand/code/adhoc/pandas_pyathena/.venv/lib/python3.10/site-packages/pyathena/filesystem/s3.py", line 524, in _initiate_upload
    raise NotImplementedError  # pragma: no cover
NotImplementedError

Removing the as_pandas import allows the script to work as intended.

Expected Behavior: pyathena doesn't hijack pandas' default s3fs library and to_csv continues to work as normal.

Versions:

pandas==2.1.1
pyathena==3.0.8
laughingman7743 commented 1 year ago

Please do not register the same issue. https://github.com/laughingman7743/PyAthena/issues/465

graydenshand commented 1 year ago

Apologies, I did search before posting but didn't find that.