mlfoundations / datacomp

DataComp: In search of the next generation of multimodal datasets
http://datacomp.ai/
Other
642 stars 54 forks source link

https://huggingface.co/datasets/djghosh/wds_flickr_1k_test_image_text_retrieval_test doesn't exist? #49

Closed mingdachen closed 1 year ago

mingdachen commented 1 year ago

trying to download flickr_1k_test_image_text_retrieval but got errors for downloading from https://huggingface.co/datasets/djghosh/wds_flickr_1k_test_image_text_retrieval_test

========== Download 'Flickr' ===========

Repo card metadata block was not found. Setting CardData to empty.
Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3305.20it/s]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 164.31it/s]
Generating test split: 1000 examples [00:00, 4210.39 examples/s]

========== Download 'Flickr' ===========

--2023-08-16 21:27:28--  https://huggingface.co/datasets/djghosh/wds_flickr_1k_test_image_text_retrieval_test/raw/main/classnames.txt
Resolving huggingface.co (huggingface.co)... 99.84.108.70, 99.84.108.129, 99.84.108.87, ...
Connecting to huggingface.co (huggingface.co)|99.84.108.70|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.
--2023-08-16 21:27:28--  https://huggingface.co/datasets/djghosh/wds_flickr_1k_test_image_text_retrieval_test/raw/main/zeroshot_classification_templates.txt
Resolving huggingface.co (huggingface.co)... 99.84.108.70, 99.84.108.129, 99.84.108.87, ...
Connecting to huggingface.co (huggingface.co)|99.84.108.70|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.
--2023-08-16 21:27:28--  https://huggingface.co/datasets/djghosh/wds_flickr_1k_test_image_text_retrieval_test/raw/main/test/nshards.txt
Resolving huggingface.co (huggingface.co)... 99.84.108.70, 99.84.108.129, 99.84.108.87, ...
Connecting to huggingface.co (huggingface.co)|99.84.108.70|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.
Traceback (most recent call last):
  File "download_datasets.py", line 139, in <module>
    sys.exit(main(args))
  File "download_datasets.py", line 15, in main
    download_datasets(args.data_dir)
  File "download_datasets.py", line 79, in download_datasets
    nshards = int(f.read())
ValueError: invalid literal for int() with base 10: ''
djghosh13 commented 1 year ago

Hi, thanks for reporting this! It actually seems like the script erroneously tried to download the Flickr test set twice - the first attempt was done correctly and succeeded. Can you double check that your download_evalsets.py and tasklist.yml are up-to-date? Have you made any changes to either file?

mingdachen commented 1 year ago

thanks! it's a mistake on my end

djghosh13 commented 1 year ago

Great, glad everything worked out for you!