Open rom1504 opened 3 years ago
would need to handle dns load balancing properly
tried https://stackoverflow.com/a/15065711/1658314 with no much success
This approach seems to work very well:
dnsperf -f inet -s 10.80.97.250 -d /tmp/list.txt -l 3600 -S 10 -Q 1000 -q 100 2>&1 | grep -v Timeout | grep -v "maybe timed out"
https://github.com/DNS-OARC/dnsperf/blob/master/README.md
I think that is pretty promising and it may be interesting to try and put that directly in img2dataset (at least the dnsperf part)
It was recently noticed that laion 400m only contains urls from 5M domains. The same is probably true for other datasets.
Pre-resolving the domains would decrease the charge on the dns process by a lot and increase downloading speed.