Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
No option to pass storage_options in materialize_dataset() #714
It would be a good addition to have an option to pass storage_options to materialize_dataset().
Materialize_dataset() method makes use of filesystem_factory https://github.com/uber/petastorm/blob/9f301085b551f215322b511545264e86e96e4a1b/petastorm/etl/dataset_metadata.py#L99 Filesystem_factory is initialized by FilesystemResolver class which has option to accept storage_options https://github.com/uber/petastorm/blob/9f301085b551f215322b511545264e86e96e4a1b/petastorm/fs_utils.py#L41