rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.71k stars 338 forks source link

wip good bad pool #262

Closed rom1504 closed 1 year ago

rom1504 commented 1 year ago

idea is to start more thread: don't wait for stragglers

seem to help initially but then doesn't

rom1504 commented 1 year ago

https://github.com/rvencu/crawlingathome-gpu-hcloud/blob/main/cloud%20boot/cloud-init.yaml#L54 helps a little also https://stackoverflow.com/questions/410616/increasing-the-maximum-number-of-tcp-ip-connections-in-linux

rom1504 commented 1 year ago

sudo sysctl net.ipv4.tcp_fin_timeout=2 might help a bit

rom1504 commented 1 year ago

nice but did not help, see #339 for next steps