Open ayushkarnawat opened 8 months ago
It seems that the validate_schema
argument has been removed with the update of pyarrow.
I resolved this issue using petastorm=0.12.1, pyarrow=10.0.1
Using Reader
and make_reader
in the petastorm, data is loaded successfully.
However, deprecated warning is being displayed due to the previous version, and It's not pretty.
I hope someone shares a fancy solution using the latest version.
here is my work
fs = s3fs.S3FileSystem(key="ACCESS_KEY", secret="SECRET_KEY", endpoint_url="ENDPOINT")
reader = make_reader(dataset_url="s3a://YOUR/DATA/PATH", filesystem=fs) as reader
or
reader = Reader(dataset_path = "s3a://YOUR/DATA/PATH")
https://github.com/uber/petastorm/issues/758#issuecomment-1785925528
Description
Parquet files are unable to be read and loaded into the proper
ParquetDataset
object when used withmake_batch_reader
. This is due to a deprecated parametervalidate_schema=False
that was removed in v15.0.0 version of pyarrow.Actual behavior
Expected behavior
The dataset is loaded properly into the
ParquetDataset
object so that it can be consumed downstream.