rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.71k stars 338 forks source link

70% of files are not downloaded due to name resolution error #360

Closed TheSeriousProgrammer closed 12 months ago

TheSeriousProgrammer commented 1 year ago

Hi, first of all thanks for providing such an amazing tool to the community

I am trying to perform a download of cc12m with the following command

img2dataset --url_list cc12m_shuffled.tsv --input_format "tsv"\
         --url_col "url" --caption_col "caption" --output-format webdataset --resize_mode keep_ratio \
           --output_folder cc12m --processes_count 11 --thread_count 512 --image_size 512\
             --enable_wandb True

In the results i.e stats json file, I see a whopping 7000 files (out of 10000) are errored out from Tempoprary failure in name resolution error. How can I fix this. I am trying to run the script from my laptop ( How do I fix this issue, this could most likely be because of a dns related issue, I have not configured knot or bind9 resolver, could that be a potential source of issue ?

rom1504 commented 1 year ago

Yes this is due to your DNS resolver being too slow. Try out knot resolver

On Sat, Nov 11, 2023, 10:27 TheSeriousProgrammer @.***> wrote:

Hi, first of all thanks for providing such an amazing tool to the community

I am trying to perform a download of cc12m with the following command

img2dataset --url_list cc12m_shuffled.tsv --input_format "tsv"\ --url_col "url" --caption_col "caption" --output-format webdataset --resize_mode keep_ratio \ --output_folder cc12m --processes_count 11 --thread_count 512 --image_size 512\ --enable_wandb True

In the results i.e stats json file, I see a whopping 7000 files (out of 10000) are errored out from Tempoprary failure in name resolution error. How can I fix this. I am trying to run the script from my laptop ( How do I fix this issue, this could most likely be because of a dns related issue, I have not configured knot or bind9 resolver, could that be a potential source of issue ?

— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437STK6RRTBHRNL5IIELYD67QNAVCNFSM6AAAAAA7HNRPKWVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DSMBXGEYDQOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Jaep0805 commented 12 months ago

updating python to 3.10 solved the problem for me