Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k
stars
284
forks
source link
Guarantee filesystem_factory returned from FilesystemResolver is serializable #370
We were accidentally capturing FilesystemResolver and its attributes in filesystem_factory returned to the user.
This resulted in failures if attempted to pass the factory to spark executors.
By making FilesystemResolver unpickable we can now cover this accidental pickling by unit tests (which were updated)
We were accidentally capturing FilesystemResolver and its attributes in filesystem_factory returned to the user. This resulted in failures if attempted to pass the factory to spark executors.
By making FilesystemResolver unpickable we can now cover this accidental pickling by unit tests (which were updated)