uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.76k stars 281 forks source link

Future Warning importing SparkDatasetConverter. #775

Closed kisel4363 closed 1 year ago

kisel4363 commented 1 year ago

The warning:

/home/hadoop/.local/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py:28: FutureWarning: pyarrow.LocalFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.LocalFileSystem instead.
  from pyarrow import LocalFileSystem

jumps when the line: from petastorm.spark.spark_dataset_converter import SparkDatasetConverter is introduced in a pyspark shell.

spark-version: 3.2.1-amzn-0 python-version: 3.7.10 petastorm-version: 0.12.0 pyarrow-version: 9.0.0

selitvin commented 1 year ago

This is actually tricky. Appears that we need to support new pyarrow dataset API, which is tricky, to upgrade the filesystem. See #613.

kisel4363 commented 1 year ago

Thank you very much @selitvin