jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

Adjusted exponential backoff to avoid being blocked with ratelimiting… #206

Closed jdrowell closed 6 years ago

jdrowell commented 7 years ago

… when using nginx ingress

From the slack channel:

jdrowell [4:12 AM] Hi ppl, I'm setting up lego with nginx ingress. It works, but I always get 5 failures before the challenge passes, and unfortunately (for me) letsencrypt is rate limiting (since april) to what I understand is 5 cert requests per same host per account inside a 7 day windows. end result is I get rate limited every time on the prod server. On the staging server it ends up working but I still see the 5 failures in the logs (running with debug logs). The failure is the following:

[4:13] time="2017-06-11T07:07:02Z" level=debug msg="error while authorizing: waiting for authorization failed: acme: identifier authorization failed" context=acme domain=

[4:14] the reachability test passes (_selftest). There's a burst of 5 requests in 10 seconds, I get the hits from Let's Encrypt validation server, responses are 200, but it only works on the 6th attempt

jdrowell [4:26 AM] Uhmm I think I see what's happening here, it seems that the first requests from Let's Encrypt are hitting the default backend, because nginx didn't have time to reload the config yet. So in my case it returns an echo of the headers instead of the challenge response. Adding a 20s delay after detecting the new tls ingress should do the trick.

-- x --

Docs for ExponentialBackOff show that the first 5 requests (which cause the ratelimiting) will happen in under 3 seconds with default options. This is way faster than the nginx ingress controller hands over the /.well_known path to kube-lego.

I created a new docker image and pushed this to production, tested with both staging and prod Let's Encrypt services and it works well. The initial request still always fails, which is not good, but if someday the handoff becomes instant this will Just Work too. I usually get the cert on the 2nd or 3rd request.

munnerz commented 7 years ago

Thanks for this - absolutely this is an issue we see a lot.

The only changes I can think here, would be to set the Initial delay interval to 30s (which is the default refresh period for nginx ingress), as well as increasing the Multiplier field from the default of 1.5, to 2.0.

I think given the nature of kube-lego running over a long period of time, I'd rather sacrifice a small amount of speed for reliability, especially if the first request for a certificate has already failed.

Once we get that done, I'll get this merged for the next 0.1.6 release!

webwurst commented 7 years ago

After some iterations we might have a useful set of rbac settings: https://github.com/jetstack/kube-lego/issues/99 Might need some final testing.

simonswine commented 6 years ago

Thanks @jdrowell, I have used your defaults and made them adjustable through environment variables

See #308