huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.7k stars 2.59k forks source link

Datasetbuilder Local Download FileNotFoundError #7001

Open purefall opened 3 days ago

purefall commented 3 days ago

Describe the bug

So I was trying to download a dataset and save it as parquet and I follow the tutorial of Huggingface. However, during the excution I face a FileNotFoundError.

I debug the code and it seems there is a bug there: So first it creates a .incomplete folder and before moving its contents the following code deletes the directory Code hence as a result I face with:

FileNotFoundError: [Errno 2] No such file or directory: '~/data/Parquet/.incomplete '

Steps to reproduce the bug

from datasets import load_dataset_builder
from pathlib import Path

parquet_dir = "~/data/Parquet/" 
Path(parquet_dir).mkdir(parents=True, exist_ok=True)
builder = load_dataset_builder(
    "rotten_tomatoes",
)
builder.download_and_prepare(parquet_dir, file_format="parquet")

Expected behavior

Downloads the files and saves as parquet

Environment info

Ubuntu, Python 3.10

datasets 2.19.1
purefall commented 3 days ago

Ok it seems the solution is to use the directory string without the trailing "/" which in my case as:

parquet_dir = "~/data/Parquet"

Still i think this is a weird behavior...