Closed jdrowell closed 6 years ago
Thanks for this - absolutely this is an issue we see a lot.
The only changes I can think here, would be to set the Initial delay interval to 30s (which is the default refresh period for nginx ingress), as well as increasing the Multiplier
field from the default of 1.5, to 2.0.
I think given the nature of kube-lego running over a long period of time, I'd rather sacrifice a small amount of speed for reliability, especially if the first request for a certificate has already failed.
Once we get that done, I'll get this merged for the next 0.1.6 release!
After some iterations we might have a useful set of rbac
settings: https://github.com/jetstack/kube-lego/issues/99
Might need some final testing.
Thanks @jdrowell, I have used your defaults and made them adjustable through environment variables
See #308
… when using nginx ingress
From the slack channel:
jdrowell [4:12 AM] Hi ppl, I'm setting up lego with nginx ingress. It works, but I always get 5 failures before the challenge passes, and unfortunately (for me) letsencrypt is rate limiting (since april) to what I understand is 5 cert requests per same host per account inside a 7 day windows. end result is I get rate limited every time on the prod server. On the staging server it ends up working but I still see the 5 failures in the logs (running with debug logs). The failure is the following:
[4:13] time="2017-06-11T07:07:02Z" level=debug msg="error while authorizing: waiting for authorization failed: acme: identifier authorization failed" context=acme domain=
[4:14] the reachability test passes (_selftest). There's a burst of 5 requests in 10 seconds, I get the hits from Let's Encrypt validation server, responses are 200, but it only works on the 6th attempt
jdrowell [4:26 AM] Uhmm I think I see what's happening here, it seems that the first requests from Let's Encrypt are hitting the default backend, because nginx didn't have time to reload the config yet. So in my case it returns an echo of the headers instead of the challenge response. Adding a 20s delay after detecting the new tls ingress should do the trick.
-- x --
Docs for ExponentialBackOff show that the first 5 requests (which cause the ratelimiting) will happen in under 3 seconds with default options. This is way faster than the nginx ingress controller hands over the /.well_known path to kube-lego.
I created a new docker image and pushed this to production, tested with both staging and prod Let's Encrypt services and it works well. The initial request still always fails, which is not good, but if someday the handoff becomes instant this will Just Work too. I usually get the cert on the 2nd or 3rd request.