Open rpardini opened 4 years ago
Manifest requests that are by digest should be cached forever. Mutable referenced are cached for 1s, and then if Hub is down the stale cache is used.
When docker does a manifest request for a mutable tag, it performs a HEAD request against hub (not counted against the rate limit), which should respond with the digest of the image, then it does a GET against the digest URL, which the nginx config will cache forever.
Thanks @cpuguy83 for the input here. Indeed if the HEAD request does not count against the limit were already set.
I guess we'll have to wait and see how the DH's implementation works in November.
DockerHub says that (HEAD requests don't count) outright on https://docs.docker.com/docker-hub/download-rate-limit/
Hmm, the plot thickens. I've added some more logging to the default logging levels, and that revealed that Docker (the client, I've tested 19.03.x) does not send HEAD requests for the manifests, but a full GET (the response to the manifest request is in the response body, not headers).
(In case of a second request after 1s passed, nginx will do a conditional request to revalidate, but that is still a GET request, albeit with If-None-Match. It would be ideal if it did GETs on an empty cache, and revalidations using HEAD; if the HEAD-revalidation returned 304, use the cached, and if not, do a new full GET to populate cache again. Unfortunately, I cannot fathom a way to coerce nginx to operate in this way.)
Just to make sure I was really getting the real deal (after spending a lot of time with proxy_cache_convert_head
which is on by default, but not to blame here), :latest-debug
has a new DEBUG_HUB env var, that inserts a mitmproxy between the proxy caching layer and DockerHub itself. That way we can go to port 8082 and watch what exactly is going to DockerHub.
docker pull rpardini/docker-registry-proxy:latest-debug # older builds than the latest had a bug, make sure to pull
docker run --rm --name docker_registry_proxy -it \
-e DEBUG=true -e DEBUG_HUB=true -p 0.0.0.0:8081:8081 -p 0.0.0.0:8082:8082 \
-p 0.0.0.0:3128:3128 \
-v $(pwd)/docker_mirror_cache:/docker_mirror_cache \
-v $(pwd)/docker_mirror_certs:/ca \
rpardini/docker-registry-proxy:latest-debug
With this setup we can see exactly what is coming in and out, and they're both GETs for manifests, always.
{"access_time":"30/Oct/2020:00:09:02 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/library/ubuntu/manifests/latest","status":"200","bytes_sent":"1201","upstream_response_time":"0.464","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
If this is indeed true, I can't see how to avoid ratelimit except by turning up the cache duration to something like 3600s (and maybe lower values for tags like latest
or tags that have no digits in it, etc)...
Some more info, by testing with docker pull ratelimitalways/test:latest
we can see that DockerHub returns, appropriately, HTTP/1.1 429 Too Many Requests
when throttling. It even has a Retry-After header.
I've changed the 403 we had before to 429 on proxy_cache_use_stale
; that should cover the case of "DockerHub is ratelimiting my IP/user, we've already everything in the cache, please continue working" case. Of course, "new" images will be still ratelimited.
Note there is also ratelimitpreview/test:latest
which uses for real throttling instead of always.
I asked if they could add a debug one so it's like... 2req/min or something to make testing easier, but they said it will be after Monday before they could do that.
I'll take a look at the docker side again... there's so many different code paths in the pull code. containerd is absolutely good here.
Indeed. Using
while true; do docker pull ratelimitpreview/test:latest; done
to simulate ratelimit, and with the http_429 already in place of http_403, we can see we get ratelimited after a few hundred pulls (even with warm cache, both in docker-client side and in the proxy). Notice how ratelimiting kicks in at 30/Oct/2020:00:40:08
, changing from REVALIDATED to STALE.
{"access_time":"30/Oct/2020:00:40:02 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.454","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:03 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.463","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:05 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.447","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:06 +0000","upstream_cache_status":"REVALIDATED","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.475","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:08 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.437","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:09 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.437","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:10 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.434","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
{"access_time":"30/Oct/2020:00:40:12 +0000","upstream_cache_status":"STALE","method":"GET","uri":"/v2/ratelimitpreview/test/manifests/latest","status":"200","bytes_sent":"527","upstream_response_time":"0.429","host":"registry-1.docker.io","proxy_host":"127.0.0.1:445","upstream":"127.0.0.1:445"}
EDIT: I also changed cache time to 0s instead of 1s, so I don't have to wait for hundreds of seconds for the ratelimit to kick in.
I'll take a look at the docker side again... there's so many different code paths in the pull code. containerd is absolutely good here.
I bet most of folks who will be heavily affected by this (the hordes with imagePullPolicy: always
everywhere, etc) will be running not-so-current versions of dockerd as well. I dunno about containerd, when I last tested this in a K8s environment, it had both dockerd and containerd running, and K8s 1.16 via kubeadm, which defaulted to Docker at the time if I'm not mistaken.
I was sincerely hoping that the "HEAD requests are not counted" policy was a documentation mistake, and that they meant "304 responses are not counted" which would make a lot more sense.
All said and done, tomorrow I will implement a simple way to bump caches from 1s to DEFAULT_MANIFEST_CACHE_TIME, and a separate regex/location for tags containing at least one digit to cache DIGIT_MANIFEST_CACHE_TIME, and make DEFAULT_MANIFEST_CACHE_TIME default to 1 hour, and DIGIT_MANIFEST_CACHE_TIME to 60 days. That should cover most people, for now.
Then I will release 0.6.x in preparation for the influx of panicked users on monday.... 😉
In the future I could investigate hacking into nginx to implement the "HEAD while revalidate, GET again if != 304" madness, but I sincerely doubt it.
So, I have concluded you are indeed correct.
Pull calls:
Which does a GET
here:
https://github.com/moby/moby/blob/bb23f1bf61cb49c5aa3f7df934d79c305afb7c8c/vendor/github.com/docker/distribution/registry/client/repository.go#L470
It should be calling Tags()
in the case of a non-digested reference, instead of Manifests()
in the first call.
I'm working on a patch upstream, but as you say... this is non-trivial to get everywhere. I would hope the distribution/registry does a HEAD for pull-through mirroring.
Nice work @cpuguy83 on moby, I suppose there's a long way there still. Meanwhile I drafted #57 with a manifest caching implementation and I'm currently testing it, any input appreciated. Thanks
Merged and released. Using 0.6.0
with -e ENABLE_MANIFEST_CACHE=true
I can't hit rate limits anymore in a pull loop. The default config caches quite aggressively, but can be tuned with individual MANIFEST_CACHE_xxx
envs, in up to 3 tiers.
:stable
, :nightly
, :production
and :test
manifests for 10 minutes.:v1.0.0
-style tagged images "forever", using a regex for 3 digits or dots. Dunno if it is a good idea.:latest
for example uses this.A field "request_type":"manifest-default"
was added to the logs, indicating which (if any) of the caching tiers a given request fit into.
Give it a spin.
EDIT: I wrote an intro, avoiding DockerHub Pull Rate Limits with Caching
Interesting way of handling caching there! I think it's pretty good.
Moby patch is merged, will be in 20.10 RC1.
I'll start work on 0.7 (or even 1.x) which will have a bunch of breaking changes:
convert_head
disabled (so no HEADs get updated to GET, lower cache hitratio, but we'll be "good citizens" against DH)
See https://www.docker.com/blog/scaling-docker-to-serve-millions-more-developers-network-egress/ This has the potential to be disastrous for eg large k8s clusters in which almost every node will try to pull the same image. Docker says it will rate limit
We already have everything in place to actually cache manifest requests. See https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L288 -- they're cached for 1 second only, and turning this up would IMHO avoid the rate limits completely. Unfortunately it would also cache mutable tags (eg, :latest) which could drive people crazy.
So maybe