jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

Failed to list *v1beta1.Ingress error #254

Open jails opened 6 years ago

jails commented 6 years ago

Hi, today I runned kube-lego for the first time and everything worked fine but after a couple of hours or so I got the following logs the kube lego pod (see the error at the end):

time="2017-09-06T10:09:55Z" level=info msg="kube-lego 0.1.5-a9592932 starting" context=kubelego 
time="2017-09-06T10:09:55Z" level=info msg="connecting to kubernetes api: https://10.31.240.1:443" context=kubelego 
time="2017-09-06T10:09:55Z" level=info msg="successfully connected to kubernetes api v1.7.4" context=kubelego 
time="2017-09-06T10:09:55Z" level=debug msg="start watching ingress objects" context=kubelego 
time="2017-09-06T10:09:55Z" level=info msg="server listening on http://:8080/" context=acme 
time="2017-09-06T10:09:55Z" level=debug msg="CREATE ingress/default/ingress" context=kubelego 
time="2017-09-06T10:09:55Z" level=debug msg="worker: begin processing true" context=kubelego 
time="2017-09-06T10:09:55Z" level=debug msg=reset context=provider provider=gce 
time="2017-09-06T10:09:55Z" level=debug msg="UPDATE ingress/default/ingress" context=kubelego 
time="2017-09-06T10:09:55Z" level=debug msg=finalize context=provider provider=gce 
time="2017-09-06T10:09:55Z" level=debug msg="setting up svc endpoint" context=provider namespace=default pod_ip=10.28.2.7 provider=gce 
time="2017-09-06T10:09:55Z" level=debug msg=reset context=provider provider=nginx 
time="2017-09-06T10:09:55Z" level=debug msg=finalize context=provider provider=nginx 
time="2017-09-06T10:09:55Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-09-06T10:09:55Z" level=info msg="process certificate requests for ingresses" context=kubelego 
time="2017-09-06T10:09:55Z" level=info msg="Attempting to create new secret" context=secret name=domain-secret-tls namespace=default 
time="2017-09-06T10:09:55Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=ingress namespace=default 
time="2017-09-06T10:09:55Z" level=info msg="requesting certificate for <DOMAIN_NAME>" context="ingress_tls" name=ingress namespace=default 
time="2017-09-06T10:09:55Z" level=info msg="Attempting to create new secret" context=secret name=lets-encrypt namespace=default 
time="2017-09-06T10:09:57Z" level=info msg="if you don't accept the TOS (https://letsencrypt.org/documents/LE-SA-v1.1.1-August-1-2016.pdf) please exit the program now" context=acme 
time="2017-09-06T10:09:57Z" level=info msg="created an ACME account (registration url: https://acme-v01.api.letsencrypt.org/acme/reg/20914123)" context=acme 
time="2017-09-06T10:09:57Z" level=info msg="Attempting to create new secret" context=secret name=lets-encrypt namespace=default 
time="2017-09-06T10:09:57Z" level=info msg="Secret successfully stored" context=secret name=lets-encrypt namespace=default 
time="2017-09-06T10:17:59Z" level=debug msg="testing reachability of http://<DOMAIN_NAME>/.well-known/acme-challenge/_selftest" context=acme domain=<DOMAIN_NAME> 
time="2017-09-06T10:18:01Z" level=debug msg="responding to challenge request" basePath="/.well-known/acme-challenge" context=acme host=<DOMAIN_NAME> token=McmuogMkzmOXu2yQxhDV4ai2QG6XszY5OR86z5SR6x8 
time="2017-09-06T10:18:03Z" level=debug msg="got authorization: &{URI:https://acme-v01.api.letsencrypt.org/acme/challenge/s2j-HD4i1_6cSxGjCQvlJrmYb-acK9tOEZmAWumypCI/1924026229 Status:valid Identifier:{Type: Value:} Challenges:[] Combinations:[]}" context=acme domain=<DOMAIN_NAME> 
time="2017-09-06T10:18:03Z" level=info msg="authorization successful" context=acme domain=<DOMAIN_NAME> 
time="2017-09-06T10:18:04Z" level=info msg="successfully got certificate: domains=[<DOMAIN_NAME>] url=https://acme-v01.api.letsencrypt.org/acme/cert/04cc0fdc1bcc1aceab78d19f102f12cec7fc" context=acme 
time="2017-09-06T10:18:04Z" level=debug msg="certificate pem data:\n-----BEGIN CERTIFICATE-----\n[XXX]\n-----END CERTIFICATE-----\n-----BEGIN CERTIFICATE-----\n[XXX]\n-----END CERTIFICATE-----\n" context=acme 
time="2017-09-06T10:18:04Z" level=info msg="Attempting to create new secret" context=secret name=domain-secret-tls namespace=default 
time="2017-09-06T10:18:04Z" level=info msg="Secret successfully stored" context=secret name=domain-secret-tls namespace=default 
time="2017-09-06T10:18:04Z" level=debug msg="worker: done processing true" context=kubelego 
time="2017-09-06T10:18:53Z" level=debug msg="UPDATE ingress/default/ingress" context=kubelego 
time="2017-09-06T10:18:53Z" level=debug msg="worker: begin processing true" context=kubelego 
time="2017-09-06T10:18:53Z" level=debug msg=reset context=provider provider=gce 
time="2017-09-06T10:18:53Z" level=debug msg=finalize context=provider provider=gce 
time="2017-09-06T10:18:53Z" level=debug msg="setting up svc endpoint" context=provider namespace=default pod_ip=10.28.2.7 provider=gce 
time="2017-09-06T10:18:53Z" level=debug msg=reset context=provider provider=nginx 
time="2017-09-06T10:18:53Z" level=debug msg=finalize context=provider provider=nginx 
time="2017-09-06T10:18:53Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-09-06T10:18:53Z" level=info msg="process certificate requests for ingresses" context=kubelego 
time="2017-09-06T10:18:53Z" level=info msg="cert expires in 90.0 days, no renewal needed" context="ingress_tls" expire_time=2017-12-05 09:18:00 +0000 UTC name=ingress namespace=default 
time="2017-09-06T10:18:53Z" level=info msg="no cert request needed" context="ingress_tls" name=ingress namespace=default 
time="2017-09-06T10:18:53Z" level=debug msg="worker: done processing true" context=kubelego 
time="2017-09-06T12:12:43Z" level=debug msg="token not found" basePath="/.well-known/acme-challenge" context=acme host=<DOMAIN_NAME> token="*" 
time="2017-09-06T12:13:20Z" level=debug msg="token not found" basePath="/.well-known/acme-challenge" context=acme host=<DOMAIN_NAME> token="*" 
time="2017-09-06T12:13:24Z" level=debug msg="token not found" basePath="/.well-known/acme-challenge" context=acme host=<DOMAIN_NAME> token=acme-challenge 
E0906 12:54:11.437384       1 reflector.go:304] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to watch *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/watch/ingresses?resourceVersion=1780&timeoutSeconds=509: dial tcp 10.31.240.1:443: getsockopt: connection refused
E0906 12:54:12.440617       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: getsockopt: connection refused
E0906 12:54:13.442426       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: getsockopt: connection refused
E0906 12:54:14.444683       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: getsockopt: connection refused
E0906 12:54:45.445745       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: i/o timeout
E0906 12:55:16.446610       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: i/o timeout
E0906 12:55:47.447495       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: i/o timeout
E0906 12:55:48.449934       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: getsockopt: connection refused
E0906 12:55:49.451823       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: getsockopt: connection refused
E0906 12:55:50.453445       1 reflector.go:201] github.com/jetstack/kube-lego/pkg/kubelego/watch.go:112: Failed to list *v1beta1.Ingress: Get https://10.31.240.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0: dial tcp 10.31.240.1:443: getsockopt: connection refused
and so on...

So looks like kube lego is losing its connection over the kubernetes API. However the connection URL was ok:

kubectl get services -o wide

NAME                               CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE       SELECTOR
kubernetes                         10.31.240.1     <none>        443/TCP          6h        <none>
nginx-service                      10.31.245.135   <nodes>       80:30972/TCP     6h        app=nginx
php-fpm-service                    10.31.246.102   <nodes>       9000:30478/TCP   6h        app=php-fpm
tls-certificates-renewal-service   10.31.249.43    <nodes>       8080:31905/TCP   6h        <none>

Then I restarted the kube-lego deployment (ie. kubectl delete & kubectl apply) everything get back to normal again.

Before I saw this error, I noticed that the kubernetes cluster autoscaled up & down and gets unavailable a minute or so (saw the spinner in front the the cluster name in the Google Cloud admin UI). However no down time noticed. Maybe the kubernetes TLS certificates of the apiserver has been updated at some point (cluster update) and kube-lego was trying to connect to the kubernetes API using some deprecated certificates ?

jails commented 6 years ago

This kind of "silent" error is an issue since domain certificates won't be updated if that error happens in the middle of the certificate validity period (and 90 days is a long enough period). Would it be difficult to make /healthz to also check the kubernetes API connection @munnerz ? So that kube lego is automatically restarted by kubernetes when that kind of error occur.

nambrot commented 5 years ago

We just ran into this issue as well