Closed dmcguire81 closed 4 years ago
import pyarrow.parquet as pq
from pyarrow.filesystem import S3FSWrapper
from s3fs import S3FileSystem
fs = S3FileSystem()
wrapped_fs = S3FSWrapper(fs)
dataset_url = "s3://some/small/partitioned/dataset"
try:
print("Trying with wrapper...")
dataset = pq.ParquetDataset(dataset_url, filesystem=wrapped_fs, validate_schema=False)
print("succeeded")
except TypeError:
print("failed.")
print("Trying without wrapper...")
dataset = pq.ParquetDataset(dataset_url, filesystem=fs, validate_schema=False)
print("succeeded.")
With 0.4.2
:
Trying with wrapper...
succeeded
With 0.5.0
:
Trying with wrapper...
./env/lib/python3.7/site-packages/pyarrow/filesystem.py:394: RuntimeWarning: coroutine 'S3FileSystem._ls' was never awaited
for key in list(self.fs._ls(path, refresh=refresh)):
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
failed.
Trying without wrapper...
succeeded.
According to the s3fs maintainer, wrapping
s3fs.S3FileSystems
withpyarrow.filesystem.S3FSWrapper
is deprecated, and even harmful (hence the defect). In other words,s3fs<0.5.0
requires the wrapper be used, ands3fs>=0.5.0
requires that it not be used. Sincepetastorm
can't do both, it should just choose to supports3fs>=0.5.0
and drop the wrapper.