kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.2k stars 6.48k forks source link

Need "ip" utlity check in Kubespray Project #11679

Open a7lan opened 3 weeks ago

a7lan commented 3 weeks ago

What happened?

Kubespray fails during deployment if the ip utility from the iproute2 package is missing on any node. This causes the [kubespray-defaults : Create fallback_ips_base] role to encounter a fatal error, stopping the deployment process.

What did you expect to happen?

I expected Kubespray to first check for the presence of the iproute2 package before executing any tasks that depend on it. If the package is missing, Kubespray should automatically install it to prevent errors during the deployment process.

How can we reproduce it (as minimally and precisely as possible)?

Set up an inventory for a Kubespray deployment where some nodes lack the iproute2 package.

For example, configure 3 control-plane nodes with iproute2 installed and 3 worker nodes without it.

Run the Kubespray playbook targeting this inventory. Observe that the deployment will fail with a fatal error during the [kubespray-defaults : Create fallback_ips_base] role due to the missing ip utility on worker nodes.

OS

DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

Version of Ansible

ansible [core 2.16.12] config file = /home/aslan/kubespray-venv/kubespray/ansible.cfg configured module search path = ['/home/aslan/kubespray-venv/kubespray/library'] ansible python module location = /home/aslan/kubespray-venv/venv/lib/python3.12/site-packages/ansible ansible collection location = /home/aslan/.ansible/collections:/usr/share/ansible/collections executable location = /home/aslan/kubespray-venv/venv/bin/ansible python version = 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] (/home/aslan/kubespray-venv/venv/bin/python3.12) jinja version = 3.1.4 libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

f9ebd45c7

Network plugin used

calico

Full inventory with variables

https://gist.github.com/a7lan/0da098dc33cee26eac893a694e50afa9

Command used to invoke ansible

ansible-playbook playbooks/upgrade_cluster.yml -i inventory/dev/inventory.ini -b -e kube_version=v1.29.9 --limit worker01

Output of ansible run

https://gist.github.com/a7lan/69ff01801613e4a1b9b7a2f3c14fed5f

Anything else we need to know

No response

VannTen commented 3 weeks ago

I don't see why you think this is related to iproute2 ?

Have you run the facts.yml playbook before using --limit ? Kubespray relies on fact cache for this. See #11598 and #11587

a7lan commented 3 weeks ago

I installed the iproute2 package on the nodes that initially lacked it, and after this, the Kubespray playbook executed successfully without issues.

Regarding fact caching, here are two sample outputs from running ansible -m setup with the filter=ansible_default_ipv4 option on nodes both with and without the iproute2 package installed. The node worker01 does not have iproute2, resulting in empty ansible_facts, while worker02 with iproute2 provides the expected network information:

Without iproute2:

$ ansible worker01 -m setup -a "filter=ansible_default_ipv4" -i inventory/dev/inventory.ini
worker01 | SUCCESS => {
    "ansible_facts": {},
    "changed": false
}

With iproute2:


$ ansible worker02 -m setup -a "filter=ansible_default_ipv4" -i inventory/dev/inventory.ini
worker02 | SUCCESS => {
    "ansible_facts": {
        "ansible_default_ipv4": {
            "address": "172.20.98.95",
            "alias": "ens18",
            "broadcast": "172.20.99.255",
            "gateway": "172.20.98.1",
            "interface": "ens18",
            "macaddress": "a6:03:57:11:8f:49",
            "mtu": 1500,
            "netmask": "255.255.254.0",
            "network": "172.20.98.0",
            "prefix": "23",
            "type": "ether"
        }
    },
    "changed": false
}
VannTen commented 3 weeks ago

You're correct. see ansible/ansible#70796

/triage accepted