rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.76k stars 341 forks source link

download LAION dataset failed #145

Closed LeeRock closed 2 years ago

LeeRock commented 2 years ago

I'm downloading the LAION dataset.Following this docs:https://github.com/rom1504/img2dataset/blob/main/dataset_examples/laion400m.md. Here is my error log: worker - success: 0.049 - failed to download: 0.938 - failed to resize: 0.013 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.943 - failed to resize: 0.009 - images per sec: 1795 - count: 450000 worker - success: 0.047 - failed to download: 0.937 - failed to resize: 0.015 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.943 - failed to resize: 0.009 - images per sec: 1835 - count: 460000 worker - success: 0.044 - failed to download: 0.943 - failed to resize: 0.013 - images per sec: 79 - count: 10000 total - success: 0.048 - failed to download: 0.943 - failed to resize: 0.009 - images per sec: 1875 - count: 470000 worker - success: 0.050 - failed to download: 0.933 - failed to resize: 0.017 - images per sec: 82 - count: 10000 total - success: 0.048 - failed to download: 0.943 - failed to resize: 0.009 - images per sec: 1915 - count: 480000 worker - success: 0.049 - failed to download: 0.936 - failed to resize: 0.015 - images per sec: 77 - count: 10000 total - success: 0.048 - failed to download: 0.943 - failed to resize: 0.009 - images per sec: 1943 - count: 490000 worker - success: 0.045 - failed to download: 0.940 - failed to resize: 0.015 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.943 - failed to resize: 0.009 - images per sec: 1983 - count: 500000 worker - success: 0.045 - failed to download: 0.941 - failed to resize: 0.014 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.009 - images per sec: 2022 - count: 510000 worker - success: 0.047 - failed to download: 0.938 - failed to resize: 0.015 - images per sec: 79 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.009 - images per sec: 2062 - count: 520000 worker - success: 0.042 - failed to download: 0.943 - failed to resize: 0.015 - images per sec: 80 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2102 - count: 530000 worker - success: 0.044 - failed to download: 0.941 - failed to resize: 0.015 - images per sec: 80 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2141 - count: 540000 worker - success: 0.047 - failed to download: 0.934 - failed to resize: 0.019 - images per sec: 84 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2181 - count: 550000 worker - success: 0.042 - failed to download: 0.943 - failed to resize: 0.016 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2221 - count: 560000 worker - success: 0.047 - failed to download: 0.935 - failed to resize: 0.017 - images per sec: 78 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2260 - count: 570000 worker - success: 0.044 - failed to download: 0.940 - failed to resize: 0.016 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2300 - count: 580000 worker - success: 0.049 - failed to download: 0.937 - failed to resize: 0.015 - images per sec: 82 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2340 - count: 590000 worker - success: 0.049 - failed to download: 0.934 - failed to resize: 0.017 - images per sec: 82 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2379 - count: 600000 worker - success: 0.051 - failed to download: 0.933 - failed to resize: 0.016 - images per sec: 83 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.010 - images per sec: 2419 - count: 610000 worker - success: 0.048 - failed to download: 0.935 - failed to resize: 0.017 - images per sec: 82 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.011 - images per sec: 2459 - count: 620000 worker - success: 0.046 - failed to download: 0.936 - failed to resize: 0.019 - images per sec: 81 - count: 10000 total - success: 0.048 - failed to download: 0.942 - failed to resize: 0.011 - images per sec: 2498 - count: 630000 64it [05:15, 1.24it/s]worker - success: 0.047 - failed to download: 0.939 - failed to resize: 0.014 - images per sec: 76 - count: 10000 total - success: 0.048 - failed to download: 0.941 - failed to resize: 0.011 - images per sec: 2506 - count: 640000

rom1504 commented 2 years ago

Did you setup knot resolver? What are the error reasons in wandb or the stats JSON files in output folder?

LeeRock commented 2 years ago

knot resolver

sorry,i just install img2dataset by pip command.According to README.md.Any docs for knot resolver?

LeeRock commented 2 years ago

Did you setup knot resolver? What are the error reasons in wandb or the stats JSON files in output folder?

3Q.I got the infomation in chapter "Setting up a high performance dns resolver".Bind9 is installed succssfully. download success RATE goes up from 0.04 to 0.20. Would you please give me more advice or guidance?

huangzhendong commented 2 years ago

Hi Guys, the same problem puzzle me a whole day, Could you post a complete bind9 configuration as example, What's more there is a proxy between the my server and internet.

rom1504 commented 2 years ago

I advise to set up knot resolver rather than bind https://github.com/rom1504/img2dataset#setting-up-a-knot-resolver

Bind also works but needs more configuration