Open jctoussaint opened 3 weeks ago
Is that reproducible with a setup like this:
[kube_control_plane]
node-1
[etcd]
node-1
node-2
node-3
[kube_node]
node-1
node-2
node-3
node-4
?
(This is the node-etcd-client
setup which is tested in CI, so if it does not catch that kind of things we need to tweak it)
I'll test it.
But I think it will work because node-1
is in kube_control_plane
and etcd
.
It worked on the first try:
PLAY RECAP *****************************************************************************************************************
k8s-test1 : ok=697 changed=154 unreachable=0 failed=0 skipped=1084 rescued=0 ignored=3
k8s-test2 : ok=561 changed=121 unreachable=0 failed=0 skipped=673 rescued=0 ignored=2
k8s-test3 : ok=561 changed=121 unreachable=0 failed=0 skipped=673 rescued=0 ignored=2
k8s-test4 : ok=512 changed=104 unreachable=0 failed=0 skipped=669 rescued=0 ignored=1
Hum it looks like the conditions are:
That'd be helpful if you can test that, otherwise I'll start a PR with that as new test case when I can
Something like this?
[all]
k8s-test1 ansible_host=192.168.0.31
k8s-test2 ansible_host=192.168.0.32 etcd_member_name=etcd1
k8s-test3 ansible_host=192.168.0.33 etcd_member_name=etcd2
k8s-test4 ansible_host=192.168.0.34 etcd_member_name=etcd3
[kube_control_plane]
k8s-test1
[etcd]
k8s-test2
k8s-test3
k8s-test4
[kube_node]
k8s-test2
k8s-test3
k8s-test4
[calico_rr]
[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr
I was more thinking something like that
[kube_control_plane]
host1
[etcd]
host2
[kube_node]
host3
[all:vars]
network_plugin=calico
(If HA is not required to trigger the bug, this makes the test less expensive in CI time) (Btw, explicit k8s_cluster is no longer required, it's dynamicly defined to the union of control-plane and node)
OK, i'll try it.
(Btw, explicit k8s_cluster is no longer required, it's dynamicly defined to the union of control-plane and node)
I tried, but I think there is an issue if k8s_cluster
does not exist :
TASK [kubespray-defaults : Set no_proxy to all assigned cluster IPs and hostnames] *****************************************
fatal: [k8s-test2 -> localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'k8s_cluster'. 'dict object' has no attribute 'k8s_cluster'\n\nThe error appears to be in '/home/me/kubespray/roles/kubespray-defaults/tasks/no_proxy.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Set no_proxy to all assigned cluster IPs and hostnames\n ^ here\n"}
I'll try to restore k8s_cluster
group.
I think you got it -> it fails:
TASK [etcd : Gen_certs | Gather node certs] ********************************************************************************
ok: [k8s-test1 -> k8s-test2(192.168.0.32)]
fatal: [k8s-test3 -> k8s-test2(192.168.0.32)]: FAILED! => {"changed": false, "cmd": "set -o pipefail && tar cfz - -C /etc/ssl/etcd/ssl ca.pem node-k8s-test3.pem node-k8s-test3-key.pem | base64 --wrap=0", "delta": "0:00:00.007001", "end": "2024-11-17 13:02:35.550815", "msg": "non-zero return code", "rc": 2, "start": "2024-11-17 13:02:35.543814", "stderr": "tar: node-k8s-test3.pem : stat impossible: Aucun fichier ou dossier de ce type\ntar: node-k8s-test3-key.pem : stat impossible: Aucun fichier ou dossier de ce type\ntar: Arrêt avec code d'échec à cause des erreurs précédentes", "stderr_lines": ["tar: node-k8s-test3.pem : stat impossible: Aucun fichier ou dossier de ce type", "tar: node-k8s-test3-key.pem : stat impossible: Aucun fichier ou dossier de ce type", "tar: Arrêt avec code d'échec à cause des erreurs précédentes"], "stdout": "H4sIAA....AKAAA"]}
.. w/ this inventory:
[all]
k8s-test1 ansible_host=192.168.0.31
k8s-test2 ansible_host=192.168.0.32
k8s-test3 ansible_host=192.168.0.33
[kube_control_plane]
k8s-test1
[etcd]
k8s-test2
[kube_node]
k8s-test3
[all:vars]
network_plugin=calico
[calico_rr]
[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr
Great, thanks for testing that ! We'll need to add that to the CI in the PR to fix this so it does not regress again.
What happened?
The task
Gen_certs | Gather node certs
fails with this message:In
k8s-worker1
nork8s-etcd1
, the filesnode-k8s-worker1.pem
andnode-k8s-worker1-key.pem
don't exist.What did you expect to happen?
In
k8s-etcd1
, the filesnode-k8s-worker1.pem
andnode-k8s-worker1-key.pem
must exist.How can we reproduce it (as minimally and precisely as possible)?
With 3 etcd dedicated servers.
Deploy with this command:
OS
Linux 6.1.0-26-amd64 x86_64 PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
Version of Ansible
ansible [core 2.16.12] config file = /home/me/kubespray/ansible.cfg configured module search path = ['/home/me/kubespray/library'] ansible python module location = /home/me/ansible-kubespray/lib/python3.11/site-packages/ansible ansible collection location = /home/me/.ansible/collections:/usr/share/ansible/collections executable location = /home/me/ansible-kubespray/bin/ansible python version = 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] (/home/me/ansible-kubespray/bin/python3) jinja version = 3.1.4 libyaml = True
Version of Python
Python 3.11.2
Version of Kubespray (commit)
e5bdb3b0b
Network plugin used
cilium
Full inventory with variables
Command used to invoke ansible
ansible-playbook -f 10 -i inventory/homecluster/inventory.ini --become --become-user=root cluster.yml -e 'unsafe_show_logs=True'
Output of ansible run
Anything else we need to know
I fixed this issue like this:
k8s-etcd1
:--tags=etcd
):--tags=etcd