Unable to update secret kubeadm-certs: illegal base64 data

HeroCC commented 10 months ago

Environment:

Cloud provider or hardware configuration:

Baremetal / VMs

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

home-kube-master:~$ printf "$(uname -srm)\n$(cat /etc/os-release)\n"
Linux 5.4.0-169-generic x86_64
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

I'm running Ansible from MacOS.

Version of Ansible (ansible --version):

$ /usr/local/Cellar/ansible@2.9/2.9.27_4/bin/ansible-playbook --version
ansible-playbook 2.9.27
  config file = None
  configured module search path = ['/Users/conlanc/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/Cellar/ansible@2.9/2.9.27_4/libexec/lib/python3.9/site-packages/ansible
  executable location = /usr/local/Cellar/ansible@2.9/2.9.27_4/bin/ansible-playbook
  python version = 3.9.17 (main, Jun 20 2023, 17:20:08) [Clang 14.0.3 (clang-1403.0.22.14.1)]

Version of Python (python --version):

❯ python3 --version
Python 3.11.5

Kubespray version (commit) (git rev-parse --short HEAD):

❯ git rev-parse --short HEAD
eeeca4a1d

Network plugin used:

Weave

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Details

``` ❯ ansible -i inventory/mycluster/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]" | sed "s/conlanc/me/g" home-kube-master | SUCCESS => { "hostvars[inventory_hostname]": { "ansible_check_mode": false, "ansible_config_file": null, "ansible_diff_mode": false, "ansible_facts": {}, "ansible_forks": 5, "ansible_host": "10.0.1.16", "ansible_inventory_sources": [ "/Users/me/Documents/infra/kube-deploy/kubespray/inventory/mycluster/inventory.ini" ], "ansible_playbook_python": "/usr/local/Cellar/ansible/8.1.0/libexec/bin/python3.11", "ansible_ssh_user": "cc", "ansible_verbosity": 0, "ansible_version": { "full": "2.15.1", "major": 2, "minor": 15, "revision": 1, "string": "2.15.1" }, "bin_dir": "/usr/local/bin", "docker_bin_dir": "/usr/bin", "docker_container_storage_setup": false, "docker_daemon_graph": "/var/lib/docker", "docker_dns_servers_strict": false, "docker_iptables_enabled": "false", "docker_log_opts": "--log-opt max-size=50m --log-opt max-file=5", "docker_rpm_keepcache": 1, "docker_storage_options": "-s overlay2", "etcd_data_dir": "/var/lib/etcd", "etcd_deployment_type": "docker", "etcd_kubeadm_enabled": false, "group_names": [ "etcd", "k8s_cluster", "kube_control_plane" ], "groups": { "all": [ "home-kube-master", "home-kube-worker1" ], "calico_rr": [], "etcd": [ "home-kube-master" ], "k8s_cluster": [ "home-kube-master", "home-kube-worker1" ], "kube_control_plane": [ "home-kube-master" ], "kube_node": [ "home-kube-worker1" ], "ungrouped": [] }, "inventory_dir": "/Users/me/Documents/infra/kube-deploy/kubespray/inventory/mycluster", "inventory_file": "/Users/me/Documents/infra/kube-deploy/kubespray/inventory/mycluster/inventory.ini", "inventory_hostname": "home-kube-master", "inventory_hostname_short": "home-kube-master", "loadbalancer_apiserver_healthcheck_port": 8081, "loadbalancer_apiserver_port": 6443, "no_proxy_exclude_workers": false, "omit": "__omit_place_holder__470692146cdc8fba57d54604cb411cf646ea398d", "playbook_dir": "/Users/me/Documents/infra/kube-deploy/kubespray" } } home-kube-worker1 | SUCCESS => { "hostvars[inventory_hostname]": { "ansible_check_mode": false, "ansible_config_file": null, "ansible_diff_mode": false, "ansible_facts": {}, "ansible_forks": 5, "ansible_host": "10.0.1.17", "ansible_inventory_sources": [ "/Users/me/Documents/infra/kube-deploy/kubespray/inventory/mycluster/inventory.ini" ], "ansible_playbook_python": "/usr/local/Cellar/ansible/8.1.0/libexec/bin/python3.11", "ansible_ssh_user": "cc", "ansible_verbosity": 0, "ansible_version": { "full": "2.15.1", "major": 2, "minor": 15, "revision": 1, "string": "2.15.1" }, "bin_dir": "/usr/local/bin", "docker_bin_dir": "/usr/bin", "docker_container_storage_setup": false, "docker_daemon_graph": "/var/lib/docker", "docker_dns_servers_strict": false, "docker_iptables_enabled": "false", "docker_log_opts": "--log-opt max-size=50m --log-opt max-file=5", "docker_rpm_keepcache": 1, "docker_storage_options": "-s overlay2", "etcd_data_dir": "/var/lib/etcd", "etcd_kubeadm_enabled": false, "group_names": [ "k8s_cluster", "kube_node" ], "groups": { "all": [ "home-kube-master", "home-kube-worker1" ], "calico_rr": [], "etcd": [ "home-kube-master" ], "k8s_cluster": [ "home-kube-master", "home-kube-worker1" ], "kube_control_plane": [ "home-kube-master" ], "kube_node": [ "home-kube-worker1" ], "ungrouped": [] }, "inventory_dir": "/Users/me/Documents/infra/kube-deploy/kubespray/inventory/mycluster", "inventory_file": "/Users/me/Documents/infra/kube-deploy/kubespray/inventory/mycluster/inventory.ini", "inventory_hostname": "home-kube-worker1", "inventory_hostname_short": "home-kube-worker1", "loadbalancer_apiserver_healthcheck_port": 8081, "loadbalancer_apiserver_port": 6443, "no_proxy_exclude_workers": false, "omit": "__omit_place_holder__470692146cdc8fba57d54604cb411cf646ea398d", "playbook_dir": "/Users/me/Documents/infra/kube-deploy/kubespray" } } ```

Command used to invoke ansible:

❯ /usr/local/Cellar/ansible@2.9/2.9.27_4/bin/ansible-playbook -i inventory/mycluster/inventory.ini -b kubespray/upgrade-cluster.yml --ask-become-pass

Output of ansible run:

The only error printed is this:

TASK [kubernetes/control-plane : Upload certificates so they are fresh and not expired] ***************************************
fatal: [home-kube-master]: FAILED! => {"changed": true, "cmd": ["/usr/local/bin/kubeadm", "init", "phase", "--config", "/etc/kubernetes/kubeadm-config.yaml", "upload-certs", "--upload-certs"], "delta": "0:00:00.517440", "end": "2024-01-03 02:07:04.520251", "msg": "non-zero return code", "rc": 1, "start": "2024-01-03 02:07:04.002811", "stderr": "W0103 02:07:04.120311  742561 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]\nerror execution phase upload-certs: error uploading certs: unable to update secret: illegal base64 data at input byte 3\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W0103 02:07:04.120311  742561 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]", "error execution phase upload-certs: error uploading certs: unable to update secret: illegal base64 data at input byte 3", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[upload-certs] Storing the certificates in Secret \"kubeadm-certs\" in the \"kube-system\" Namespace", "stdout_lines": ["[upload-certs] Storing the certificates in Secret \"kubeadm-certs\" in the \"kube-system\" Namespace"]}

When running the command by hand, I get the same error:

cc@home-kube-master:~$ sudo /usr/local/bin/kubeadm init phase --config /etc/kubernetes/kubeadm-config.yaml upload-certs --upload-certs
W0103 02:16:16.965759  748882 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
error execution phase upload-certs: error uploading certs: unable to update secret: illegal base64 data at input byte 3
To see the stack trace of this error execute with --v=5 or higher

I'm also now unable to edit or delete the secret at all:

❯ kubectl delete secret kubeadm-certs -n kube-system
Error from server: illegal base64 data at input byte 3

Anything else do we need to know:

This error first occurred when attempting to update to Kubespray v2.16.0 from 2.15. I tried bumping to 2.17 to see if it was resolved between releases, no dice. The cluster was in fine working order before, but I'm getting this error now, and many functions of the cluster are degraded. I have tried rerunning the playbook, and a quick scan through the actual certificates seem fine, so I'm not sure what else to do here, especially since I can't even delete the broken secret. I'd appreciate any help you can give me!

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

HeroCC commented 6 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sryther commented 2 months ago

I'm having the same issue. I tried the methods from https://github.com/kubernetes/kubectl/issues/1405#issuecomment-1497900520 but got the following error:

$ kubectl delete secret kubeadm-certs -n kube-system
Error from server: illegal base64 data at input byte 3

Unfortunately, it looks like a few people had that issue so we have almost nothing to search on the internet. Anyway, I'm stuck at this point.

I tried the hard way: deleting the key from etcd.

# Run in the etcd container (in my context): 
ETCDCTL_API=3 etcdctl del /registry/secrets/kube-system/kubeadm-certs

I won't recommend this at all unless you lost all hope before creating a whole new cluster :).

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 month ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/10761#issuecomment-2428934148): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / kubespray

Unable to update secret kubeadm-certs: illegal base64 data #10761