huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.15k stars 2.67k forks source link

Use load_dataset to load imagenet-1K But find a empty dataset #7139

Open fscdc opened 1 month ago

fscdc commented 1 month ago

Describe the bug

def get_dataset(data_path, train_folder="train", val_folder="val"):
    traindir = os.path.join(data_path, train_folder)
    valdir = os.path.join(data_path, val_folder)

    def transform_val_examples(examples):
        transform = Compose([
            Resize(256),
            CenterCrop(224),
            ToTensor(),
        ])
        examples["image"] = [transform(image.convert("RGB")) for image in examples["image"]]
        return examples

    def transform_train_examples(examples):
        transform = Compose([
            RandomResizedCrop(224),
            RandomHorizontalFlip(),
            ToTensor(),
        ])
        examples["image"] = [transform(image.convert("RGB")) for image in examples["image"]]
        return examples

    # @fengsicheng: This way is very slow for big dataset like ImageNet-1K (but can pass the network problem using local dataset)
    # train_set = load_dataset("imagefolder", data_dir=traindir, num_proc=4)
    # test_set = load_dataset("imagefolder", data_dir=valdir, num_proc=4)

    train_set = load_dataset("imagenet-1K", split="train", trust_remote_code=True)                                                                                                                                                                                                            
    test_set = load_dataset("imagenet-1K", split="test", trust_remote_code=True)

    print(train_set["label"])

    train_set.set_transform(transform_train_examples)
    test_set.set_transform(transform_val_examples)

    return train_set, test_set
above the code, but output of the print is a list of None:
image

Steps to reproduce the bug

  1. just ran the code
  2. see the print

Expected behavior

I do not know how to fix this, can anyone provide help or something? It is hurry for me

Environment info

the-silent-geek commented 2 weeks ago

Imagenet-1k is a gated dataset which means you’ll have to agree to share your contact info to access it. Have you tried this yet? Once you have, you can sign in with your user token (you can find this in your Hugging Face account settings) when prompted by running.

huggingface-cli login
train_set  = load_dataset('imagenet-1k', split='train', use_auth_token=True)
fscdc commented 1 week ago

Thanks a lot! It helps me