microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.18k stars 2.55k forks source link

Unable to download half of the images from URL given in TextDiffuser #1649

Open wangz315 opened 1 week ago

wangz315 commented 1 week ago

I followed instruction give in readme, however, when I trying to run cmd: img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no and wait for a while, I got: worker - success: 0.548 - failed to download: 0.440 - failed to resize: 0.012 - images per sec: 17 - count: 10000 total - success: 0.553 - failed to download: 0.435 - failed to resize: 0.012 - images per sec: 16 - count: 4460000 It says about 0.44 images failed. Is there a way to get those images? Thanks for help :)

JingyeChen commented 1 week ago

thanks for your interest in TextDiffuser. Please refer to this dataset download link:

https://huggingface.co/datasets/JingyeChen22/TextDiffuser-MARIO-10M