projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

Improved / automated certificate rotation for calico-apiserver (+ better docs) #9419

Open sebhoss opened 2 weeks ago

sebhoss commented 2 weeks ago

We are running the API server using the manifest based installation and have added a cert-manager managed certificate to it. We observed that after rotation of the certificate, the API server is no longer usable, even though kubectl get apiservices still reports that the service is available. The docs at https://docs.tigera.io/calico/latest/operations/install-apiserver#install-the-api-server don't mention anything about certificate renewal and I'm wondering what the best mode of operation is here.

Expected Behavior

It would be great if the API server somehow notices that the certificate was renewed and reloads its certificate itself without requiring us to manually restart it.

Current Behavior

We need to restart the API server every time the certificate is renewed

Possible Solution

It could check whether /code/apiserver.local.config/certificates/apiserver.crt and /code/apiserver.local.config/certificates/apiserver.key changed and reload those files.

Steps to Reproduce (for bugs)

  1. Deploy API server with manifests
  2. Create certificate
  3. Wait until certificate expires/rotates
  4. Do something that involves the API server

Context

We noticed that this is happening while deleting a namespace that was not related to calico in any way. Kubernetes tried to contact the calico API server during finalization and since the certificate was invalid at that point, was not able to delete the namespace.

Your Environment

caseydavenport commented 1 week ago

@sebhoss thanks for raising - just to confirm, you did rotate the certificate but noticed that the Calico API server didn't load the new certificate until being restarted?

sebhoss commented 1 week ago

@caseydavenport yes exactly. Here is an example showing what is happening for us:

  1. Do something like kubectl diff with a calico resource:
$ kubectl diff -k ./calico-policies
diff --new-file --unified --color /tmp/LIVE-127225091/projectcalico.org.v3.GlobalNetworkPolicy..allow-unknown-destinations /tmp/MERGED-4003578456/projectcalico.org.v3.GlobalNetworkPolicy..allow-unknown-destinations
--- /tmp/LIVE-127225091/projectcalico.org.v3.GlobalNetworkPolicy..allow-unknown-destinations    2024-11-05 05:42:01.626050778 +0100
+++ /tmp/MERGED-4003578456/projectcalico.org.v3.GlobalNetworkPolicy..allow-unknown-destinations 2024-11-05 05:42:01.626050778 +0100
@@ -0,0 +1,15 @@
+apiVersion: projectcalico.org/v3
+kind: GlobalNetworkPolicy
+metadata:
+  creationTimestamp: "2024-11-05T04:42:01Z"
+  name: allow-unknown-destinations
+  uid: 5c98f3c6-5ea2-4f1f-9380-bb47a62c292e
+spec:
+  egress:
+  - action: Allow
+    destination: {}
+    source: {}
+  order: 1999
+  selector: allow_untrusted_destinations == "enabled"
+  types:
+  - Egress
  1. Use cmctl to renew certificate of calico-apiserver:
$ cmctl renew calico-apiserver -n calico-apiserver
Manually triggered issuance of Certificate calico-apiserver/calico-apiserver
  1. Run the same diff command again:
$ kubectl diff -k ./calico-policies
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get globalnetworkpolicies.projectcalico.org allow-unknown-destinations)
  1. Verify that calico-apiserver is still marked as ready:
$ kubectl get apiservices | grep calico
v1.crd.projectcalico.org                     Local                                   True        306d
v3.projectcalico.org                         calico-apiserver/calico-api             True        289d
  1. Restart calico-apiserver:
$ kubectl -n calico-apiserver rollout restart deployment calico-apiserver
deployment.apps/calico-apiserver restarted
  1. kubectl diff works fine again (same output as 1.)
caseydavenport commented 1 week ago

@sebhoss great, thanks for the detailed description!

Sounds like at a minimum, we need documentation to explain that the apiserver requires a restart after rotating the ceritificate.

Agree it would be a nice enhancement to have the apiserver detect that the certificate has changed and reload it without requiring a manual restart of the pod.