shubhamprshr27 / NeglectedTailsVLM

This repository houses the code for the paper - "The Neglected of VLMs"
20 stars 1 forks source link

only 60G of text data? #3

Open Harzva opened 1 month ago

Harzva commented 1 month ago

python laion_parser.py --database LAION400M --downstream imagenet_1k --datasource {location_of_hf_download} --max_threads 16 Do I need to download the complete LAION400M dataset or only 60G of text data to run this code? can I download 60G of text data only ? Thank you very much.

shubhamprshr27 commented 1 month ago

Hi @Harzva

  1. You will need to download LAION metadata that is around 64GBs if I am not wrong. You will have to download it officially and run our code on top.
  2. I plan to release the captions soon on huggingface, you can refer to that once done.