uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

No option to pass storage_options in materialize_dataset() #714

Open manjuransari-zz opened 3 years ago

manjuransari-zz commented 3 years ago

It would be a good addition to have an option to pass storage_options to materialize_dataset().

Materialize_dataset() method makes use of filesystem_factory https://github.com/uber/petastorm/blob/9f301085b551f215322b511545264e86e96e4a1b/petastorm/etl/dataset_metadata.py#L99 Filesystem_factory is initialized by FilesystemResolver class which has option to accept storage_options https://github.com/uber/petastorm/blob/9f301085b551f215322b511545264e86e96e4a1b/petastorm/fs_utils.py#L41