I'm trying to save a Dataset using the save_to_disk() function with:
num_proc > 1
dataset_path being a s3 bucket path e.g. "s3://{bucket_name}/{dataset_folder}/"
The hf progress bar shows up but the saving does not seem to start.
When using one processor only (num_proc=1), everything works fine.
When saving the dataset on local disk (as opposed to s3 bucket) with num_proc > 1, everything works fine.
Describe the bug
I'm trying to save a
Dataset
using thesave_to_disk()
function with:num_proc > 1
dataset_path
being a s3 bucket path e.g. "s3://{bucket_name}/{dataset_folder}/"The hf progress bar shows up but the saving does not seem to start. When using one processor only (
num_proc=1
), everything works fine. When saving the dataset on local disk (as opposed to s3 bucket) withnum_proc > 1
, everything works fine.Thank you for your help! :)
Steps to reproduce the bug
I tried without any storage options:
and with the specific s3fs storage options:
I'm guessing I might use
storage_options
parameter wrongly, but I didn't find anything online that made it work.NB: Behavior is the same when trying to save the whole
DatasetDict
.Expected behavior
Progress bar fills in and saving is carried out.
Environment info
datasets==2.18.0