Check whether all IPs returned for data.commoncrawl.org are used

I'm not sure whether Python does a DNS resolve for every urlopen call or not. I noticed that data.commoncrawl.org returns multiple IPs, so we could spread the load over multiple cloudfront servers.

;; ANSWER SECTION:
data.commoncrawl.org.   279 IN  CNAME   ds5q9oxwqwsfj.cloudfront.net.
ds5q9oxwqwsfj.cloudfront.net. 39 IN A   54.230.10.119
ds5q9oxwqwsfj.cloudfront.net. 39 IN A   54.230.10.41
ds5q9oxwqwsfj.cloudfront.net. 39 IN A   54.230.10.84
ds5q9oxwqwsfj.cloudfront.net. 39 IN A   54.230.10.28

If yes maybe Python can be a bit more efficient in caching the DNS results.

If no, maybe we can hook into the resolver to return a randomly selected IP of the returned IPs each time.

hplt-project / ia-download

Check whether all IPs returned for data.commoncrawl.org are used #1