kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.13k stars 6.47k forks source link

Deploy k8s not working - wrong ansible group names in hosts.yaml #11397

Open sleepdan opened 3 months ago

sleepdan commented 3 months ago

What happened?

I'm trying to deploy a k8s cluster using these instructions https://kubespray.io/#/docs/getting_started/getting-started.

After executing the command: CONFIG_FILE=inventory/mycluster/hosts.yml python3 contrib/inventory_builder/inventory.py ${IPS[@]} a file inventory/mycluster/hosts.yml is created, which specifies groups of hosts with names: kube_control_plane, kube_node, k8s_cluster, calico_rr .

And if you try to start the deployment, the process stops because Ansible playbooks uses different group names. For example, the playbooks/boilerplate.yml file uses group names: kube-master, kube-node, k8s-cluster, calico-rr, which differ from those specified in inventory/mycluster/hosts.yml.

And in the playbooks/cluster.yml file a completely different group name is used kube_control_plane

I assume that the names of all groups need to be aligned with the same values ​​used in the project.

What did you expect to happen?

I expect everything will work as described in the instructions.

How can we reproduce it (as minimally and precisely as possible)?

Try following these instructions https://kubespray.io/#/docs/getting_started/getting-started.

OS

Darwin 23.4.0 x86_64

Version of Ansible

ansible [core 2.16.9] config file = None configured module search path = ['/Users/user1/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /Users/user1/test/venv/lib/python3.12/site-packages/ansible ansible collection location = /Users/user1/.ansible/collections:/usr/share/ansible/collections executable location = /Users/user1/test/venv/bin/ansible python version = 3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)] (/Users/user1/test/venv/bin/python3.12) jinja version = 3.1.4 libyaml = True

Version of Python

Python 3.12.4

Version of Kubespray (commit)

a78d5e78e

release-2.25

Network plugin used

calico

Full inventory with variables

all: hosts: node1: ansible_host: 10.26.1.54 ip: 10.26.1.54 access_ip: 10.26.1.54 node2: ansible_host: 10.26.1.56 ip: 10.26.1.56 access_ip: 10.26.1.56 children: kube_control_plane: hosts: node1: node2: kube_node: hosts: node1: node2: etcd: hosts: node1: k8s_cluster: children: kube_control_plane: kube_node: calico_rr: hosts: {}

Command used to invoke ansible

ansible-playbook -i inventory/mycluster/hosts.yaml -u root -b -v --private-key=~/.ssh/id_ed25519 cluster.yml

Output of ansible run

Using /Users/user1/test/kubespray/ansible.cfg as config file [WARNING]: While constructing a mapping from /Users/user1/test/kubespray/roles/bootstrap-os/tasks/main.yml, line 29, column 7, found a duplicate dict key (paths). Using last defined value only. [WARNING]: Skipping callback plugin 'ara_default', unable to load

PLAY [Check Ansible version] ** вторник 23 июля 2024 16:16:46 +0700 (0:00:00.074) 0:00:00.074 **

TASK [Check 2.16.4 <= Ansible version < 2.17.0] *** ok: [node1] => { "changed": false, "msg": "All assertions passed" } вторник 23 июля 2024 16:16:47 +0700 (0:00:00.053) 0:00:00.127 **

TASK [Check that python netaddr is installed] ***** ok: [node1] => { "changed": false, "msg": "All assertions passed" } вторник 23 июля 2024 16:16:47 +0700 (0:00:00.143) 0:00:00.271 **

TASK [Check that jinja is not too old (install via pip)] ** ok: [node1] => { "changed": false, "msg": "All assertions passed" } [WARNING]: Could not match supplied host pattern, ignoring: kube-master

PLAY [Add kube-master nodes to kube_control_plane] **** skipping: no hosts matched [WARNING]: Could not match supplied host pattern, ignoring: kube-node

PLAY [Add kube-node nodes to kube_node] *** skipping: no hosts matched [WARNING]: Could not match supplied host pattern, ignoring: k8s-cluster

PLAY [Add k8s-cluster nodes to k8s_cluster] *** skipping: no hosts matched [WARNING]: Could not match supplied host pattern, ignoring: calico-rr

PLAY [Add calico-rr nodes to calico_rr] *** skipping: no hosts matched [WARNING]: Could not match supplied host pattern, ignoring: no-floating

PLAY [Add no-floating nodes to no_floating] *** skipping: no hosts matched [WARNING]: Could not match supplied host pattern, ignoring: bastion

PLAY [Install bastion ssh config] ***** skipping: no hosts matched

PLAY [Bootstrap hosts for Ansible] **** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.125) 0:00:00.397 ** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.082) 0:00:00.479 ** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.057) 0:00:00.537 ** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.069) 0:00:00.607 ** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.058) 0:00:00.666 ** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.068) 0:00:00.735 ** вторник 23 июля 2024 16:16:47 +0700 (0:00:00.079) 0:00:00.814 ** [WARNING]: raw module does not support the environment keyword

Anything else we need to know

No response

abjklk commented 3 months ago

There does not seem to be an issue here.

What you mention are steps to support old naming style as mentioned in playbooks/boilerplate.yml here.

# These are inventory compatibility tasks to ensure we keep compatibility with old style group names

As your inventory uses the latest convention (generated by inventory builder), these tasks are skipped and do not interrupt execution.

sleepdan commented 3 months ago

So the fact of the matter is that the cluster deployment process stops and does not run. In the "Output of ansible run" section, I provided a log of the work in which the process hangs after the "PLAY [Bootstrap hosts for Ansible]" task. After I corrected the group names and made them the same everywhere (in the files that I specified), the process began to work normally.

abjklk commented 3 months ago

What do you mean by the "process hangs after the play" ? Does it mean that the execution ends abruptly, or it just gets stuck ? If ansible exits (pass or fail), it should also print the summary, which i do not see in the logs provided.

sleepdan commented 3 months ago

The process gets stuck, nothing further is output to the log, I waited about 10 minutes, after which I interrupted it using Ctrl+C. In fact, you can try to reproduce it yourself in the same way as I tried to do according to the instructions.

abjklk commented 3 months ago

you can try to run ansible with extra verbosity (-vvv). Anyway, there seems to be a problem with your env, as this is not reproducible.

k8s-triage-robot commented 1 day ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale