Closed johananl closed 3 years ago
Status update:
I'm following the instructions here.
Update Kubernetes controller manager's --root-ca-file to include both old and new CA and restart controller manager.
Doing this ^ step doesn't work. kube-controller-manager errors out complaining about finding two certificates instead of one.
I've tried the same with a file which contains just the new certificate. This makes kube-controller-manager stop trusting responses from the API server because kube-controller-manager doesn't trust the API server's CA. Updating the CA cert on the API server breaks kubectl
.
Perhaps see https://github.com/kubernetes/website/issues/23323 ?
Status update:
I've made some progress. Looks like we have the following covered:
Following the above the kubelets no longer trust the API server (which makes sense since they don't yet know about the new CA) so things like kubectl logs
or kubectl exec
aren't working. I'm on it right now.
Things left to do:
lokoctl
isn't reverting any of the TLS-related changes.Ordering seems to matter a lot in the cert rotation process and there are certain times where the user needs to wait for things to converge, which makes scripting the whole thing a bit trickier.
The guide has proved to be very unreliable and not 100% applicable to our use case (because we have a self-hosted control plane). In addition, some code snippets in the guide are broken at the syntax level, which makes me suspect the author hasn't actually tried following the guide...
I've updated the previous comment with the latest status. I'll try to make as much progress as I can on this until the next sprint and then we'll see what work is left to do.
Made some progress around making kubelets trust kube-apiserver again after rotating kube-apiserver certs. It doesn't work yet but I have a much better understanding of the moving parts and I think I may have found the right order of operations to rotate the kubelet certs without getting into an unrecoverable cluster state.
The tricky part:
/etc/kubernetes/ca.crt
on the host./etc/kubernetes/ca.crt
on the host, the self-hosted kubelet doesn't pick up the change./etc/kubernetes/ca.crt
is immediately overwritten to the old file. I think we may need to rotate ca.crt
on kube-apiserver before restarting the kubelet, however there is a good chance we would have to somehow re-create the kube-apiserver pod for the chance to take effect, which requires an operational kubelet which trusts kube-apiserver... This issue may be related.My very messy WIP is here: https://github.com/kinvolk/lokomotive/commits/johananl/k8s-cert-rotation
This is the latest state of the notes I've been writing for myself while trying to figure out the correct "recipe" for rotating the certs safely. In case someone else picks this up, please don't treat this document in its current state as a user-facing document or even a developer document as it needs to be completed, organized and cleaned up. Same goes for my "creative" commit messages :slightly_smiling_face:
Important points to consider:
lokoctl
doesn't revert the changes done manually.I took a stab at understanding how the certificates work as well as how to go about rotating them especially if the CA is also to be rotated. However I eventually got a lot more confused than I expected and hence couldn't make significant progress on this task.
One thing I want to make note of this we should not be doing the certificate creation using the tools(openssl etc) rather we should be utilizing terraform to generate the certificates to make the certificate generation process easier and since we have terraform generating the certificates when the cluster is bootstrapped it only makes sense to utilize further when rotating certs.
Yes @ipochi, this is a tough one.
I've opted for openssl
for two reasons:
In my experience the tricky part isn't generating the certificates but rather rotating them in a way which doesn't break the cluster. That said, if using Terraform helps in any way, we can go that route of course.
I'll continue with this task.
One thing I want to make note of this we should not be doing the certificate creation using the tools(openssl etc) rather we should be utilizing terraform to generate the certificates to make the certificate generation process easier and since we have terraform generating the certificates when the cluster is bootstrapped it only makes sense to utilize further when rotating certs.
100% agree :+1:
Given the size of this task, I think it's reasonable to break it down into smaller tasks, so we can accomplish it step-by-step. I'll start from this.
Given that private keys do not expire, I think in a first step we can skip rotating them. We should focus on expiring certificates, which may break cluster functionality.
WIP PR which will be BIG: https://github.com/kinvolk/lokomotive/pull/1198.
Closing in favor of #1215
In https://github.com/kinvolk/lokomotive/issues/309 we plan to automate the renewal of k8s certificates. In the meantime, we should document how to rotate the certificates manually.