Handle DockerHub's imminent rate limiting/throttling implementation

rpardini commented 4 years ago

See https://www.docker.com/blog/scaling-docker-to-serve-millions-more-developers-network-egress/ This has the potential to be disastrous for eg large k8s clusters in which almost every node will try to pull the same image. Docker says it will rate limit

manifest requests
by IP address

We already have everything in place to actually cache manifest requests. See https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L288 -- they're cached for 1 second only, and turning this up would IMHO avoid the rate limits completely. Unfortunately it would also cache mutable tags (eg, :latest) which could drive people crazy.

So maybe

an ENV var to enable longer caching of manifest requests and completely avoid ratelimiting
an ENV var to exclude some manifest names from caching

cpuguy83 commented 4 years ago

Manifest requests that are by digest should be cached forever. Mutable referenced are cached for 1s, and then if Hub is down the stale cache is used.

When docker does a manifest request for a mutable tag, it performs a HEAD request against hub (not counted against the rate limit), which should respond with the digest of the image, then it does a GET against the digest URL, which the nginx config will cache forever.

rpardini commented 4 years ago

Thanks @cpuguy83 for the input here. Indeed if the HEAD request does not count against the limit were already set.

I guess we'll have to wait and see how the DH's implementation works in November.

rpardini commented 4 years ago

DockerHub says that (HEAD requests don't count) outright on https://docs.docker.com/docker-hub/download-rate-limit/

rpardini commented 4 years ago

Hmm, the plot thickens. I've added some more logging to the default logging levels, and that revealed that Docker (the client, I've tested 19.03.x) does not send HEAD requests for the manifests, but a full GET (the response to the manifest request is in the response body, not headers).

(In case of a second request after 1s passed, nginx will do a conditional request to revalidate, but that is still a GET request, albeit with If-None-Match. It would be ideal if it did GETs on an empty cache, and revalidations using HEAD; if the HEAD-revalidation returned 304, use the cached, and if not, do a new full GET to populate cache again. Unfortunately, I cannot fathom a way to coerce nginx to operate in this way.)

Just to make sure I was really getting the real deal (after spending a lot of time with proxy_cache_convert_head which is on by default, but not to blame here), :latest-debug has a new DEBUG_HUB env var, that inserts a mitmproxy between the proxy caching layer and DockerHub itself. That way we can go to port 8082 and watch what exactly is going to DockerHub.

docker pull rpardini/docker-registry-proxy:latest-debug # older builds than the latest had a bug, make sure to pull
docker run --rm --name docker_registry_proxy -it \
       -e DEBUG=true -e DEBUG_HUB=true -p 0.0.0.0:8081:8081 -p 0.0.0.0:8082:8082 \
       -p 0.0.0.0:3128:3128 \
       -v $(pwd)/docker_mirror_cache:/docker_mirror_cache \
       -v $(pwd)/docker_mirror_certs:/ca \
       rpardini/docker-registry-proxy:latest-debug

With this setup we can see exactly what is coming in and out, and they're both GETs for manifests, always.

{"access_time":"30/Oct/2020:00:09:02 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/library/ubuntu/manifests/latest","status":"200","bytes_sent":"1201","upstream_response_time":"0.464","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}

If this is indeed true, I can't see how to avoid ratelimit except by turning up the cache duration to something like 3600s (and maybe lower values for tags like latest or tags that have no digits in it, etc)...

rpardini commented 4 years ago

Some more info, by testing with docker pull ratelimitalways/test:latest we can see that DockerHub returns, appropriately, HTTP/1.1 429 Too Many Requests when throttling. It even has a Retry-After header. I've changed the 403 we had before to 429 on proxy_cache_use_stale; that should cover the case of "DockerHub is ratelimiting my IP/user, we've already everything in the cache, please continue working" case. Of course, "new" images will be still ratelimited.

cpuguy83 commented 4 years ago

Note there is also ratelimitpreview/test:latest which uses for real throttling instead of always. I asked if they could add a debug one so it's like... 2req/min or something to make testing easier, but they said it will be after Monday before they could do that.

I'll take a look at the docker side again... there's so many different code paths in the pull code. containerd is absolutely good here.

rpardini commented 4 years ago

Indeed. Using

while true; do docker pull ratelimitpreview/test:latest; done

to simulate ratelimit, and with the http_429 already in place of http_403, we can see we get ratelimited after a few hundred pulls (even with warm cache, both in docker-client side and in the proxy). Notice how ratelimiting kicks in at 30/Oct/2020:00:40:08, changing from REVALIDATED to STALE.

{"access_time":"30/Oct/2020:00:40:02 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.454","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:03 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.463","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:05 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.447","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:06 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.475","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:08 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.437","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:09 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.437","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:10 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.434","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:12 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.429","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}

EDIT: I also changed cache time to 0s instead of 1s, so I don't have to wait for hundreds of seconds for the ratelimit to kick in.

rpardini commented 4 years ago

I'll take a look at the docker side again... there's so many different code paths in the pull code. containerd is absolutely good here.

I bet most of folks who will be heavily affected by this (the hordes with imagePullPolicy: always everywhere, etc) will be running not-so-current versions of dockerd as well. I dunno about containerd, when I last tested this in a K8s environment, it had both dockerd and containerd running, and K8s 1.16 via kubeadm, which defaulted to Docker at the time if I'm not mistaken.

I was sincerely hoping that the "HEAD requests are not counted" policy was a documentation mistake, and that they meant "304 responses are not counted" which would make a lot more sense.

All said and done, tomorrow I will implement a simple way to bump caches from 1s to DEFAULT_MANIFEST_CACHE_TIME, and a separate regex/location for tags containing at least one digit to cache DIGIT_MANIFEST_CACHE_TIME, and make DEFAULT_MANIFEST_CACHE_TIME default to 1 hour, and DIGIT_MANIFEST_CACHE_TIME to 60 days. That should cover most people, for now.

Then I will release 0.6.x in preparation for the influx of panicked users on monday.... 😉

In the future I could investigate hacking into nginx to implement the "HEAD while revalidate, GET again if != 304" madness, but I sincerely doubt it.

cpuguy83 commented 4 years ago

So, I have concluded you are indeed correct.

Pull calls:

https://github.com/moby/moby/blob/bb23f1bf61cb49c5aa3f7df934d79c305afb7c8c/distribution/pull_v2.go#L333

Which calls https://github.com/moby/moby/blob/bb23f1bf61cb49c5aa3f7df934d79c305afb7c8c/vendor/github.com/docker/distribution/registry/client/repository.go#L179-L184

Which does a GET here: https://github.com/moby/moby/blob/bb23f1bf61cb49c5aa3f7df934d79c305afb7c8c/vendor/github.com/docker/distribution/registry/client/repository.go#L470

It should be calling Tags() in the case of a non-digested reference, instead of Manifests() in the first call.

I'm working on a patch upstream, but as you say... this is non-trivial to get everywhere. I would hope the distribution/registry does a HEAD for pull-through mirroring.

cpuguy83 commented 4 years ago

Patch: https://github.com/moby/moby/pull/41607

rpardini commented 4 years ago

Nice work @cpuguy83 on moby, I suppose there's a long way there still. Meanwhile I drafted #57 with a manifest caching implementation and I'm currently testing it, any input appreciated. Thanks

rpardini commented 4 years ago

Merged and released. Using 0.6.0 with -e ENABLE_MANIFEST_CACHE=true I can't hit rate limits anymore in a pull loop. The default config caches quite aggressively, but can be tuned with individual MANIFEST_CACHE_xxx envs, in up to 3 tiers.

The default first tier caches :stable, :nightly, :production and :test manifests for 10 minutes.
Second tier caches :v1.0.0-style tagged images "forever", using a regex for 3 digits or dots. Dunno if it is a good idea.
Last/default tier caches everything else for 1 hour. :latest for example uses this.

A field "request_type":"manifest-default" was added to the logs, indicating which (if any) of the caching tiers a given request fit into.

Give it a spin.

EDIT: I wrote an intro, avoiding DockerHub Pull Rate Limits with Caching

cpuguy83 commented 4 years ago

Interesting way of handling caching there! I think it's pretty good.

cpuguy83 commented 4 years ago

Moby patch is merged, will be in 20.10 RC1.

rpardini commented 3 years ago

I'll start work on 0.7 (or even 1.x) which will have a bunch of breaking changes:

introduce the method (GET/HEAD/etc) in the cache key (will definitely break existing caches)
convert_head disabled (so no HEADs get updated to GET, lower cache hitratio, but we'll be "good citizens" against DH)
run as non-root (break existing caches maybe?)
some podman support

rpardini / docker-registry-proxy

Handle DockerHub's imminent rate limiting/throttling implementation #54