kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.15k stars 6.47k forks source link

Cluster upgrade to v1.30.2 fails on "Upgrade first Control Plane" #11350

Closed bogd closed 2 months ago

bogd commented 4 months ago

What happened?

Attempted to upgrade a cluster from v1.29.3 to v1.30.2. The upgrade playbook fails on kubeadm upgrade apply, with error can not mix '--config' with arguments [allow-experimental-upgrades certificate-renewal etcd-upgrade force yes], in this task:

TASK [kubernetes/control-plane : Kubeadm | Upgrade first master] ************************************************
Wednesday 03 July 2024  17:18:34 +0000 (0:00:01.906)       0:31:56.562 ******** 
FAILED - RETRYING: [k8s-staging-01-master]: Kubeadm | Upgrade first master (3 retries left).
FAILED - RETRYING: [k8s-staging-01-master]: Kubeadm | Upgrade first master (2 retries left).
FAILED - RETRYING: [k8s-staging-01-master]: Kubeadm | Upgrade first master (1 retries left).
fatal: [k8s-staging-01-master]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.30.2", "--certificate-renewal=True", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--etcd-upgrade=false", "--force"], "delta": "0:00:00.083731", "end": "2024-07-03 17:18:55.750605", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2024-07-03 17:18:55.666874", "stderr": "can not mix '--config' with arguments [allow-experimental-upgrades certificate-renewal etcd-upgrade force yes]\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["can not mix '--config' with arguments [allow-experimental-upgrades certificate-renewal etcd-upgrade force yes]", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

What did you expect to happen?

Successful upgrade of the cluster

How can we reproduce it (as minimally and precisely as possible)?

Attempt to upgrade cluster from v.1.29 to v1.30

OS

Linux 5.15.0-113-generic x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.16.8]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.12/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] (/usr/bin/python3)
  jinja version = 3.1.4
  libyaml = True

Version of Python

python version = 3.12.3

Version of Kubespray (commit)

474b259cf

Network plugin used

calico

Full inventory with variables

[ Removed, since it was huge and was making the issue difficult to read. Will provide a gist on request, if needed ]

Command used to invoke ansible

ansible-playbook on custom playbook that imports kubespray/playbooks/upgrade_cluster.yml

Output of ansible run

TASK [kubernetes/control-plane : Kubeadm | Upgrade first master] ************************************************
Wednesday 03 July 2024  17:18:34 +0000 (0:00:01.906)       0:31:56.562 ******** 
FAILED - RETRYING: [k8s-staging-01-master]: Kubeadm | Upgrade first master (3 retries left).
FAILED - RETRYING: [k8s-staging-01-master]: Kubeadm | Upgrade first master (2 retries left).
FAILED - RETRYING: [k8s-staging-01-master]: Kubeadm | Upgrade first master (1 retries left).
fatal: [k8s-staging-01-master]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.30.2", "--certificate-renewal=True", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--etcd-upgrade=false", "--force"], "delta": "0:00:00.083731", "end": "2024-07-03 17:18:55.750605", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2024-07-03 17:18:55.666874", "stderr": "can not mix '--config' with arguments [allow-experimental-upgrades certificate-renewal etcd-upgrade force yes]\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["can not mix '--config' with arguments [allow-experimental-upgrades certificate-renewal etcd-upgrade force yes]", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

Anything else we need to know

No response

tmurakam commented 4 months ago

Hmm.. It seems followng error is root cause. can not mix '--config' with arguments [allow-experimental-upgrades certificate-renewal etcd-upgrade force yes] I think we need fix kubeadm-upgrade.yml.

bogd commented 4 months ago

This seems to be a recent change (possibly as recent as K8s v1.30?) - not allowing any configuration-changing flags on upgrade.

I cannot find it in the release notes, but see for example here, and here (which is specifically related to --yes )

tmurakam commented 4 months ago

I think we need to upgrade the kubeadm configration from v1beta3 to v1beta4, and configure UpgradeApplyConfiguration instead of arguments. https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta4/

But it seems that the v1beta4 is not supported yet.

tmurakam commented 4 months ago

I asked a question at https://github.com/kubernetes/kubeadm/issues/3084#issuecomment-2209123104

tmurakam commented 4 months ago

I got a answer https://github.com/kubernetes/kubeadm/issues/3084#issuecomment-2209300846

I think we need to remove --config option from kubeadm upgrade. Do you all have any concerns to remove the option?

tmurakam commented 4 months ago

I opened a PR to fix this.

zzvara commented 3 months ago

Kubespray master is broken because of this issue.

ledroide commented 3 months ago

I confirm the same issue with master at commit dd51ef6f.

The fix from @tmurakam worked fine for me.

ccureau commented 2 months ago

I can also confirm the PR mentioned above works. I created a new cluster this morning, and then upgraded it after.

ArnCo commented 2 months ago

The referenced PR has the side effect that the variables that are modified in the kubeadm-config file are not reflected in the manifests anymore. Example: modify the kube_scheduler_bind variable in the playbook. The variable is correctly set in the kubeadm-config.yaml file, but the according kube-scheduler.yaml manifest is not modified, thus configuration is not applied.

This is, in my opinion, a regression.

tmurakam commented 2 months ago

@ArnCo I think we can't change configuration on upgrading anymore because kubeadm does not accept kubeadm-config.yaml file on upgrading. If we want to change the configuration, I think we need to run kubespray with new configurations without upgrading first, then upgrade the cluster without configuration changes. Please let me know if there is a better way.

ArnCo commented 2 months ago

@tmurakam Well I'm fiddling with our cluster right now. It seems that the kubeadm upgrade command was not meant to reconfigure the cluster, my bad. To apply the changes to our cluster, I backed-up the /etc/kubernetes folder and run kubeadm init phase control-plane scheduler --config /etc/kubernetes/kubeadm-config.yaml

This had the effect to update the manifests and consequently my changes. Right now, I think that Kubespray does not execute kubeadm init if the manifest files already exist.