Closed cosmo3769 closed 1 month ago
Is this FSTimeoutError
due to download network issue from remote resource (from where it is being accessed)?
It seems to happen for all datasets, not just a specific one, and especially for versions after 3.0. (3.0.0, 3.0.1 have this problem)
I had the same error on a different dataset, but after downgrading to datasets==2.21.0, the problem was solved.
Same as https://github.com/huggingface/datasets/issues/7164
This dataset is made of a python script that downloads data from elsewhere than HF, so availability depends on the original host. Ultimately it would be nice to host the files of this dataset on HF
in datasets
<3.0 there were lots of mechanisms that got removed after the decision to make datasets with python loading scripts legacy for security and maintenance reasons (we only do very basic support now)
@lhoestq Thank you for the clarification! Closing the issue.
I'm getting this too, and also at 5 minutes. But for CSTR-Edinburgh/vctk
, so it's not just this dataset, it seems to be a timeout that was introduced and needs to be raised. The progress bar was moving along just fine before the timeout, and I get more or less of it depending on how fast the network is.
You can change the aiohttp
timeout from 5min to 1h like this:
import datasets, aiohttp
dataset = datasets.load_dataset(
dataset_name,
storage_options={'client_kwargs': {'timeout': aiohttp.ClientTimeout(total=3600)}}
)
Describe the bug
When using
load_dataset
to load HuggingFaceM4/VQAv2, I am gettingFSTimeoutError
.Error
It usually fails around 5-6 GB.
Steps to reproduce the bug
To reproduce it, run this in colab notebook:
Expected behavior
It should download properly.
Environment info
Using Colab Notebook.