uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Unable to extract storage_options from URL #691

Closed manjuransari-zz closed 3 years ago

manjuransari-zz commented 3 years ago

Integration of fsspec in Petastorm was a great step.

Fsspec support extracting storage_options from input url. In Petastorm, the url format is restricted i.e the option of extracting the storage options from url is blocked.

Petastorm internally converts the input url into required format (get_dataset_path(parsed_url) in fs_utils.py). It would be of great help if we could have mechanism to extract the storage_options from the input url similar to fsspec.

selitvin commented 3 years ago

@manjuransari : could you propose a fix PR for this issue?

manjuransari-zz commented 3 years ago

@selitvin sure, will try to raise PR for the fix.

selitvin commented 3 years ago

0.11.2 was released and it includes @manjuransari 's fix. Thanks, Manjur!