Open drigz opened 7 years ago
I've looked in the logs for the kubernetes-letsencrypt and noticed two things.
One: the CloudDnsResponder threw an exception early on:
Exception in thread "Thread-2" java.lang.UnsupportedOperationException: Empty collection can't be reduced.
at in.tazj.k8s.letsencrypt.acme.CloudDnsResponder.findMatchingZone(CloudDnsResponder.kt:123)
at in.tazj.k8s.letsencrypt.acme.CloudDnsResponder.updateCloudDnsRecord(CloudDnsResponder.kt:55)
at in.tazj.k8s.letsencrypt.acme.CloudDnsResponder.addChallengeRecord(CloudDnsResponder.kt:26)
at in.tazj.k8s.letsencrypt.acme.CertificateRequestHandler.prepareDnsChallenge(CertificateRequestHandler.kt:176)
at in.tazj.k8s.letsencrypt.acme.CertificateRequestHandler.authorizeDomain(CertificateRequestHandler.kt:77)
at in.tazj.k8s.letsencrypt.acme.CertificateRequestHandler.access$authorizeDomain(CertificateRequestHandler.kt:27)
at in.tazj.k8s.letsencrypt.acme.CertificateRequestHandler$requestCertificate$1.accept(CertificateRequestHandler.kt:41)
at in.tazj.k8s.letsencrypt.acme.CertificateRequestHandler$requestCertificate$1.accept(CertificateRequestHandler.kt:27)
[SNIP: java.util.stream.*]
at in.tazj.k8s.letsencrypt.acme.CertificateRequestHandler.requestCertificate(CertificateRequestHandler.kt:41)
at in.tazj.k8s.letsencrypt.kubernetes.ServiceManager.handleCertificateRequest(ServiceManager.kt:64)
at in.tazj.k8s.letsencrypt.kubernetes.ServiceManager.access$handleCertificateRequest(ServiceManager.kt:20)
at in.tazj.k8s.letsencrypt.kubernetes.ServiceManager$reconcileService$1.run(ServiceManager.kt:45)
at java.lang.Thread.run(Thread.java:745)
This appears to be because our Cloud DNS configuration had the wrong zone, so the responder didn't work.
Two: this error occurs 300 times before the rate limit error takes its place. This takes about an hour because the operation is retried very frequently. The retries continue, leading to rate limit errors every 45 seconds or so.
Two things that could help this:
authz
should be deleted if the CloudDnsResponder crashes, to avoid hitting the "pending authorizations" limit.Thanks for reporting this, I'll look into handling this more gracefully!
Thanks! FYI, as a workaround, we deleted the letsencrypt-keypair
secret. This makes kubernetes-letsencrypt create a new user with an empty quota.
kubectl --namespace kube-system delete secret letsencrypt-keypair
Note: LE just enabled pending authorization recycling, which might (help) avoid this issue:
https://community.letsencrypt.org/t/automatic-recycling-of-pending-authorizations/41321
Interesting! I started working on the issues you reported yesterday - but time is currently a scarce resource :-)
Using kubernetes-letsencrypt v1.7 with Cloud DNS and GKE, we've observed a "too many currently pending authorizations" error. This is surprising, since the limit is 300 pending authorizations, but we only have ~10 certificates on the domain. kubernetes-letsencrypt was previously working fine, but when a new team member tried to bring up their own cluster, they ran into this issue.
On the Let's Encrypt forums, schoen said:
and
Is that possible? If we see it again, what can we do to get more debug information?