uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.76k stars 281 forks source link

FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. #800

Open ton11111 opened 8 months ago

ton11111 commented 8 months ago

When I call make_reader, I keep getting the following warning in each epoch. Will this be fixed in the future?

Code

  from petastorm import make_reader
  from petastorm.pytorch import DataLoader

  reader = make_reader(
      dataset_url=f"file://train.parquet",
      shuffle_rows=False
  )
  return DataLoader(reader, batch_size=128)

Warning

/opt/conda/lib/python3.9/site-packages/petastorm/py_dict_reader_worker.py:267: FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.partitioning' attribute instead.

Here is my version. pyarrow: 13.0.0 petastorm: 0.12.1

tingstam commented 8 months ago

Same issue pyarrow 8.0.0 petastorm 0.12.1