Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
/home/hadoop/.local/lib/python3.7/site-packages/petastorm/spark/spark_dataset_converter.py:28: FutureWarning: pyarrow.LocalFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.LocalFileSystem instead.
from pyarrow import LocalFileSystem
jumps when the line:
from petastorm.spark.spark_dataset_converter import SparkDatasetConverter
is introduced in a pyspark shell.
The warning:
jumps when the line:
from petastorm.spark.spark_dataset_converter import SparkDatasetConverter
is introduced in a pyspark shell.spark-version: 3.2.1-amzn-0 python-version: 3.7.10 petastorm-version: 0.12.0 pyarrow-version: 9.0.0