FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version.

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.76k stars 281 forks source link

When I call make_reader, I keep getting the following warning in each epoch. Will this be fixed in the future?

Code

  from petastorm import make_reader
  from petastorm.pytorch import DataLoader

  reader = make_reader(
      dataset_url=f"file://train.parquet",
      shuffle_rows=False
  )
  return DataLoader(reader, batch_size=128)

Warning

/opt/conda/lib/python3.9/site-packages/petastorm/py_dict_reader_worker.py:267: FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.partitioning' attribute instead.

Here is my version. pyarrow: 13.0.0 petastorm: 0.12.1

uber / petastorm

FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. #800