Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Integration of fsspec in Petastorm was a great step.
Fsspec support extracting storage_options from input url. In Petastorm, the url format is restricted i.e the option of extracting the storage options from url is blocked.
Petastorm internally converts the input url into required format (get_dataset_path(parsed_url) in fs_utils.py). It would be of great help if we could have mechanism to extract the storage_options from the input url similar to fsspec.
Integration of fsspec in Petastorm was a great step.
Fsspec support extracting storage_options from input url. In Petastorm, the url format is restricted i.e the option of extracting the storage options from url is blocked.
Petastorm internally converts the input url into required format (get_dataset_path(parsed_url) in fs_utils.py). It would be of great help if we could have mechanism to extract the storage_options from the input url similar to fsspec.