huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.25k stars 2.69k forks source link

Skip if a dataset has issues #6548

Open hadianasliwa opened 10 months ago

hadianasliwa commented 10 months ago

Describe the bug

Hello everyone, I'm using load_datasets from huggingface to download the datasets and I'm facing an issue, the download starts but it reaches some state and then fails with the following error: Couldn't reach https://huggingface.co/datasets/wikimedia/wikipedia/resolve/4cb9b0d719291f1a10f96f67d609c5d442980dc9/20231101.ext/train-00000-of-00001.parquet

Failed to resolve \'huggingface.co\' ([Errno -3] Temporary failure in name resolution)"))')))

image

so I was wondering is there a parameter to be passed to load_dataset() to skip files that can't be downloaded??

Steps to reproduce the bug

Parameter to be passed to load_dataset() of huggingface to skip files that can't be downloaded??

Expected behavior

load_dataset() finishes without error

Environment info

None

lhoestq commented 10 months ago

It looks like a transient DNS issue. It should work fine now if you try again.

There is no parameter in load_dataset to skip failed downloads. In your case it would have skipped every single subsequent download until the DNS issue was resolved anyway.