lilHermit commented 5 months ago

What happened?

When deploying to a single node cluster (as a test) it errors with

TASK [kubernetes/control-plane : Kubeadm | regenerate apiserver cert 2/2] **********************************************************************************************************************************************************************************************************************************************
fatal: [node1]: FAILED! => {"changed": true, "cmd": ["/usr/local/bin/kubeadm", "init", "phase", "certs", "apiserver", "--config=/etc/kubernetes/kubeadm-config.yaml"], "delta": "0:00:00.131136", "end": "2024-03-11 12:11:21.848985", "msg": "non-zero return code", "rc": 3, "start": "2024-03-11 12:11:21.717849", "stderr": "apiServer.certSANs: Invalid value: \"node1,\": altname is not a valid IP address, DNS label or a DNS label with subdomain wildcards: a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'); a wildcard DNS-1123 subdomain must start with '*.', followed by a valid DNS subdomain, which must consist of lower case alphanumeric characters, '-' or '.' and end with an alphanumeric character (e.g. '*.example.com', regex used for validation is '\\*\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["apiServer.certSANs: Invalid value: \"node1,\": altname is not a valid IP address, DNS label or a DNS label with subdomain wildcards: a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'); a wildcard DNS-1123 subdomain must start with '*.', followed by a valid DNS subdomain, which must consist of lower case alphanumeric characters, '-' or '.' and end with an alphanumeric character (e.g. '*.example.com', regex used for validation is '\\*\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

When ssh'ing into the node and checking /etc/kubernetes/kubeadm-config.yaml the certSAN has an invalid RFC entry as below

apiServer:
  certSANs:
  - kubernetes
  - kubernetes.default
  - kubernetes.default.svc
  - kubernetes.default.svc.cluster.local
  - 10.233.0.1
  - localhost
  - 127.0.0.1
  - node1
  - lb-apiserver.kubernetes.local
  - 10.0.0.51
  - node1,

As you can see the final element includes a comma which isn't anywhere in my config. If I remove this and then rerun the following on the node it succeeds. However obviously kubespray as errored

sudo /usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --skip-phases=addon/coredns --upload-certs

What did you expect to happen?

No errors and a healthy cluster

How can we reproduce it (as minimally and precisely as possible)?

Copy the sample inventory and use the following inventory.ini

[all]
node1 ansible_host=10.0.0.51 ansible_user=pi

[kube_control_plane]
node1

[etcd]
node1

[kube_node]
node1

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

This is using the docker image quay.io/kubespray/kubespray:v2.23.3 and also quay.io/kubespray/kubespray:v2.24.1

OS

Running kubespray via docker on Ubuntu to a node running on a raspberry-pi 4 (also ubuntu)

Version of Ansible

ansible [core 2.14.6] config file = /kubespray/ansible.cfg configured module search path = ['/kubespray/library'] ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections executable location = /usr/local/bin/ansible python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/usr/bin/python3) jinja version = 3.1.2 libyaml = True

Version of Python

Python 3.10.12

Version of Kubespray (commit)

quay.io/kubespray/kubespray:v2.23.3 (docker tag)

Network plugin used

calico

Full inventory with variables

See above inventry.ini nothing special

Command used to invoke ansible

ansible-playbook -i inventory/pi-cluster2/inventory.ini --become --become-user=root cluster.yml

Output of ansible run

https://pastebin.com/vh06yfgc

Anything else we need to know

The nodes are arm64 but doubt that's the issue

KubeKyrie commented 4 months ago

Is it kubespray release 2.23?

I use the latest kubespray code and could not reproduce it.

certSANs:
  - "kubernetes"
  - "kubernetes.default"
  - "kubernetes.default.svc"
  - "kubernetes.default.svc.cluster.local"
  - "10.233.0.1"
  - "localhost"
  - "127.0.0.1"
  - "node1"
  - "lb-apiserver.kubernetes.local"
  - "10.6.88.1"
  - "node1.cluster.local"

dev1983 commented 3 months ago

hello

Facing similar issue, Trying to upgrade existing cluster kubespray version: v2.24.1

Why I am getting k8s_cluster appended? From which variable kubespray taking "k8s_cluster" value?

Ansible inventory hosts BEGIN in /etc/hosts file

192.168.1.141 kube-master.k8s_cluster kube-master 192.168.1.160 kube-worker-1.k8s_cluster kube-worker-1 192.168.1.134 kube-worker-2.k8s_cluster kube-worker-2

Also in kubeadm-config.conf

"kubernetes.default.svc.k8s_cluster
- "node1.k8s_cluster"

k8s it should be some . rather

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes-sigs / kubespray

Error with certSAN entry in kubeadm-config.yaml - contains a comma #11000