kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.75k stars 713 forks source link

Error upgrading to 1.29.x with external CA #3055

Closed NorthFuture closed 4 months ago

NorthFuture commented 5 months ago

What happened?

Our clusters, currently at 1.28.9, are configured with external CA (no ca.key on filesystems) and all certificates are generated by an external system.

During the upgrade from 1.28.9 to 1.29.4 with the following command

kubeadm --kubeconfig /root/.kube/config --certificate-renewal=false upgrade apply v1.29.4

we get the following error

the CA files do not exist, please run kubeadm init phase certs ca to generate it: failed to load key: couldn't load the private key file /etc/kubernetes/pki/ca.key: open /etc/kubernetes/pki/ca.key: no such file or directory [upgrade/postupgrade] FATAL post-upgrade error

the /root/.kube/config is an external config file with super admin short lived certificates

After a bit of digging, I found this

https://github.com/kubernetes/kubernetes/blob/d138c022d7fb3436add1c97b07004cf10319fb42/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L75

It seems it's not possible to upgrade to 1.29 with an external CA.

What did you expect to happen?

upgrade a cluster to 1.29 with an external CA.

How can we reproduce it (as minimally and precisely as possible)?

try to upgrade a cluster without ca.key inside pki folder

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version Client Version: v1.28.9 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.4 ```

Cloud provider

on premise, vanilla version

OS version

```console # On Linux: $ cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" $ uname -a Linux xxx 6.1.0-20-amd64 kubernetes/kubernetes#1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux ```

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented 5 months ago

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 5 months ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
neolit123 commented 5 months ago

/transfer kubeadm

neolit123 commented 5 months ago

looks like this is something we did not cover with e2e tests https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-external-ca-1-29 (TODO: we need to include upgrades)

workaround: is it an option for you to temporary copy the "ca.key" to the node where 'kubeadm upgrade apply" is called? after upgrade "ca.key" can be deleted.

neolit123 commented 5 months ago

https://github.com/kubernetes/kubernetes/blob/d138c022d7fb3436add1c97b07004cf10319fb42/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L75

this function call migrates the admin.conf on the node to not have a super user "system:masters", and generates a new super-admin.conf file with the super user.

we could skip this process for external CA users, then later when they renew manually "admin.conf" they would be picking a user they want.

only 1.29 is affected as 1.30 removed this function. it's a one release patch (migration) solution.

neolit123 commented 5 months ago

fix for 1.29.next (5) is here: https://github.com/kubernetes/kubernetes/pull/124682 i think the 'next' release is middle of May.

NorthFuture commented 5 months ago

Thank you for the prompt response. We'll wait for the next release, since the intermedate ca key is sealed on our vault and can't be extracted. We should issue a new temporary intermediate CA with an external private key for each of cluster and it's not straightforward since our root CA is airgapped 😀 again thank you for the fix

neolit123 commented 5 months ago

e2e addition for the upgrade scenario https://github.com/kubernetes/kubeadm/pull/305

neolit123 commented 4 months ago

fixed in 1.29.5