jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

Auto-renewal of certificates is not being triggered in 0.1.6 #338

Closed carsonoid closed 6 years ago

carsonoid commented 6 years ago

The fixes put into 0.1.6 for workqueue limiting appear to be preventing proper autorenewals. I can clearly see the ingress rules being processed the first time the pod starts. But the periodic check is not actually processing items off of the queue.

I have captured debug output below from a test cluster with one ingress rule.

Here you can see kube-lego start:

time="2018-06-18T20:00:28Z" level=info msg="kube-lego 0.1.6-4eb6cd03 starting" context=kubelego
time="2018-06-18T20:00:28Z" level=info msg="connecting to kubernetes api: https://172.28.0.1:443" context=kubelego
time="2018-06-18T20:00:29Z" level=info msg="successfully connected to kubernetes api v1.8.8" context=kubelego
time="2018-06-18T20:00:29Z" level=debug msg="start watching ingress objects" context=kubelego
time="2018-06-18T20:00:29Z" level=info msg="server listening on http://:8080/" context=acme
time="2018-06-18T20:00:29Z" level=debug msg="CREATE ingress/kube-system/dashboard-ingress" context=kubelego
time="2018-06-18T20:00:29Z" level=info msg="Queued item \"kube-system/dashboard-ingress\" to be processed immediately" context=kubelego

Then it processes the first ingress which was directly added to the queue:

time="2018-06-18T20:00:29Z" level=debug msg="worker: begin processing kube-system/dashboard-ingress" context=kubelego
time="2018-06-18T20:00:29Z" level=debug msg=reset context=provider provider=nginx
time="2018-06-18T20:00:29Z" level=debug msg=finalize context=provider provider=nginx
time="2018-06-18T20:00:29Z" level=debug msg=reset context=provider provider=gce
time="2018-06-18T20:00:29Z" level=debug msg=finalize context=provider provider=gce
time="2018-06-18T20:00:29Z" level=info msg="process certificate requests for ingresses" context=kubelego
time="2018-06-18T20:00:29Z" level=info msg="cert expires in 89.9 days, no renewal needed" context=ingress_tls expire_time="2018-09-16 16:31:42 +0000 UTC" name=dashboard-ingress namespace=kube-system
time="2018-06-18T20:00:29Z" level=info msg="no cert request needed" context=ingress_tls name=dashboard-ingress namespace=kube-system
time="2018-06-18T20:00:29Z" level=debug msg="worker: done processing kube-system/dashboard-ingress" context=kubelego

Now it waits 5 min before re-queuing all

time="2018-06-18T20:05:29Z" level=info msg="Periodically check certificates at 2018-06-18 20:05:29.011212948 +0000 UTC m=+300.097915791" context=kubelego

And that's it.... We should see the same worker logs from pod start here... but instead get nothing until the next "requeue"

time="2018-06-18T20:10:29Z" level=info msg="Periodically check certificates at 2018-06-18 20:10:29.011197648 +0000 UTC m=+600.097900499" context=kubelego
time="2018-06-18T20:15:29Z" level=info msg="Periodically check certificates at 2018-06-18 20:15:29.011191621 +0000 UTC m=+900.097894402" context=kubelego
time="2018-06-18T20:20:29Z" level=info msg="Periodically check certificates at 2018-06-18 20:20:29.011179568 +0000 UTC m=+1200.097882362" context=kubelego

I think this has to do with the switch to a RateLimitingWorkqueue and the split between Add and AddRateLimted here: https://github.com/jetstack/kube-lego/blob/b77e3f2e64589f076d9371f958d8627b5092eda0/pkg/kubelego/watch.go#L95

But I'm not sure what the solution is (besides moving to cert-manager which I can't do right now)

carsonoid commented 6 years ago

After some more debugging I have found that there is an error here that is not being surfaced:

https://github.com/jetstack/kube-lego/blob/3fb9912ffec6a6544d5afb6e9bd2e750c184959d/pkg/kubelego/kubelego.go#L150

This is failing:

https://github.com/jetstack/kube-lego/blob/b77e3f2e64589f076d9371f958d8627b5092eda0/pkg/kubelego/watch.go#L38

I added some debug lines and this is the error:

ERRO[0585] object has no meta: object does not implement the Object interfaces  context=kubelego
munnerz commented 6 years ago

Closing this now as #339 has merged and 0.1.7 cut. Thanks for the report and for the extensive debugging!