jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

kube-lego fails to update certificates on GCP load balancer #188

Closed lukabirsa closed 7 years ago

lukabirsa commented 7 years ago

We've deployed kube-lego on GCP/Kubernetes and have successfully provisioned certificates for various https services. Today we had a failure on one of the web pages with an expiring HTTPS certificate.

Triage:

  1. Checked if we recieved any emails regarding expiring certificates - OK!
  2. Checked if kube-lego is running - OK!
  3. Checked if kube-lego logs showing any issues with certificates - OK! Logs showed that check for the offending service's certificates showed more than 60 days until expiry.
  4. Checked kube-lego provisioned TLS secrets - OK! Certificate chain was downloaded from kubernetes and checked against openssl.
  5. Restarted kube-lego to see if that helps with anything. No, everything stays the same. Since this cluster is running on preemtive instances it gets restarted daily so this was a long shot.
  6. Manually checked GCP load balancer - expiry date for certificate was wrong. - FAIL!
  7. Manually replaced GCP load balancer certificate - created a new certificate with kube-lego .key and .crt and replaced the offending certificate.
  8. Service back online.

I can already see that there are other services that will fail next as certificates will expire. What should we do to enforce certificate updates on GCP LBs automatically? I think that our NGINX ingress (we do run that in parallel for some other services) does not have the same issue.

munnerz commented 7 years ago

Hi there

Thanks for the issue -

So this is outside the scope of kube-lego specifically. As you say, the nginx ingress controller does not suffer from this same problem. As part of the 'contract' of an ingress controller, the controller should be watching for changes to secret resources and if a change occurs, should sync that change with it's corresponding cloud platform/sync it to disk (in the nginx ingress case).

It's beyond the scope for kube-lego itself to 'jump in' and hijack the job of the GCLB controller.

That said, if it's known that the GCLB doesn't currently support automatically updating certificates/doesn't watch for changes to existing secrets, we should make that known in this repository to prevent situations like your own.

I've had one other mention of this issue from a customer of ours, where their symptoms were exactly the same (valid certificate in the secret, but not being synced to google cloud so required some manual intervention). We deduced it was down to this issue here: https://github.com/kubernetes/ingress/issues/330

munnerz commented 7 years ago

Ah! It also appears to be fixed in the newest version in the GCE ingress controller!

yes, delete whichever one that isn't used. Version 0.9.3 has the fix for this... Currently being cherrypicked into K8s.

munnerz commented 7 years ago

I'm going to close this issue as it appears to be fixed in the latest GCLB controller, and isn't a responsibility of kube-lego.

lukabirsa commented 7 years ago

What would be the correct procedure to trigger this manually until k8s is updated? It's probably not manually uploading new certificates, but rather forcing k8s to recreate the ingress with downtime?