huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.24k stars 2.69k forks source link

Loading just one particular split is not possible for imagenet-1k #6793

Open PaulPSta opened 7 months ago

PaulPSta commented 7 months ago

Describe the bug

I'd expect the following code to download just the validation split but instead I get all data on my disk (train, test and validation splits)

` from datasets import load_dataset

dataset = load_dataset("imagenet-1k", split="validation", trust_remote_code=True) `

Is it expected to work like that?

Steps to reproduce the bug

  1. Install the required libraries (python, datasets, huggingface_hub)
  2. Login using huggingface cli
  3. Run the code in the description

Expected behavior

Just a single (validation) split should be downloaded.

Environment info

python: 3.12.2 datasets: 2.18.0 huggingface_hub: 0.22.2

fxmarty-amd commented 2 months ago

+1