uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.76k stars 281 forks source link

Petastorm break with pyarrow 13.0 or newer. Stable version of pyarrow is at 16.0 now. #805

Open LauritsDixen opened 1 month ago

LauritsDixen commented 1 month ago

Several parts of pyarrow has changed and deprecated.

Running with 12.0 gives FutureWarnings of using objects that has deprecated since 5.0.0!

Here is such a warning. /python3.11/site-packages/petastorm/etl/dataset_metadata.py:288: FutureWarning: ParquetDatasetPiece is deprecated as of pyarrow 5.0.0 and will be removed in a future version

Are there any plans of updating?

selitvin commented 1 month ago

I do not actively maintain the project. Would be happy to review a PR if you can put it up.