kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
320 stars 49 forks source link

Document manual k8s certificate rotation process #1031

Closed johananl closed 4 years ago

johananl commented 4 years ago

In https://github.com/kinvolk/lokomotive/issues/309 we plan to automate the renewal of k8s certificates. In the meantime, we should document how to rotate the certificates manually.

johananl commented 4 years ago

Status update:

I'm following the instructions here.

Update Kubernetes controller manager's --root-ca-file to include both old and new CA and restart controller manager.

Doing this ^ step doesn't work. kube-controller-manager errors out complaining about finding two certificates instead of one.

I've tried the same with a file which contains just the new certificate. This makes kube-controller-manager stop trusting responses from the API server because kube-controller-manager doesn't trust the API server's CA. Updating the CA cert on the API server breaks kubectl.

invidian commented 4 years ago

Perhaps see https://github.com/kubernetes/website/issues/23323 ?

johananl commented 4 years ago

Status update:

I've made some progress. Looks like we have the following covered:

Following the above the kubelets no longer trust the API server (which makes sense since they don't yet know about the new CA) so things like kubectl logs or kubectl exec aren't working. I'm on it right now.

Things left to do:

Ordering seems to matter a lot in the cert rotation process and there are certain times where the user needs to wait for things to converge, which makes scripting the whole thing a bit trickier.

The guide has proved to be very unreliable and not 100% applicable to our use case (because we have a self-hosted control plane). In addition, some code snippets in the guide are broken at the syntax level, which makes me suspect the author hasn't actually tried following the guide...

johananl commented 4 years ago

I've updated the previous comment with the latest status. I'll try to make as much progress as I can on this until the next sprint and then we'll see what work is left to do.

johananl commented 4 years ago

Made some progress around making kubelets trust kube-apiserver again after rotating kube-apiserver certs. It doesn't work yet but I have a much better understanding of the moving parts and I think I may have found the right order of operations to rotate the kubelet certs without getting into an unrecoverable cluster state.

The tricky part:

johananl commented 4 years ago

My very messy WIP is here: https://github.com/kinvolk/lokomotive/commits/johananl/k8s-cert-rotation

This is the latest state of the notes I've been writing for myself while trying to figure out the correct "recipe" for rotating the certs safely. In case someone else picks this up, please don't treat this document in its current state as a user-facing document or even a developer document as it needs to be completed, organized and cleaned up. Same goes for my "creative" commit messages :slightly_smiling_face:

johananl commented 4 years ago

Important points to consider:

ipochi commented 4 years ago

I took a stab at understanding how the certificates work as well as how to go about rotating them especially if the CA is also to be rotated. However I eventually got a lot more confused than I expected and hence couldn't make significant progress on this task.

One thing I want to make note of this we should not be doing the certificate creation using the tools(openssl etc) rather we should be utilizing terraform to generate the certificates to make the certificate generation process easier and since we have terraform generating the certificates when the cluster is bootstrapped it only makes sense to utilize further when rotating certs.

johananl commented 4 years ago

Yes @ipochi, this is a tough one.

I've opted for openssl for two reasons:

  1. I was hoping to get a better understanding of how the PKI works at a low level.
  2. I wanted to eventually have a set of commands which can be executed as a script rather than asking the user to follow instructions. This is especially relevant if we want to move towards automated certificate rotation, a process which IMO is unlikely to be done using Terraform.

In my experience the tricky part isn't generating the certificates but rather rotating them in a way which doesn't break the cluster. That said, if using Terraform helps in any way, we can go that route of course.

invidian commented 4 years ago

I'll continue with this task.

One thing I want to make note of this we should not be doing the certificate creation using the tools(openssl etc) rather we should be utilizing terraform to generate the certificates to make the certificate generation process easier and since we have terraform generating the certificates when the cluster is bootstrapped it only makes sense to utilize further when rotating certs.

100% agree :+1:

Given the size of this task, I think it's reasonable to break it down into smaller tasks, so we can accomplish it step-by-step. I'll start from this.

invidian commented 4 years ago

Given that private keys do not expire, I think in a first step we can skip rotating them. We should focus on expiring certificates, which may break cluster functionality.

WIP PR which will be BIG: https://github.com/kinvolk/lokomotive/pull/1198.

iaguis commented 4 years ago

Closing in favor of #1215