kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.14k stars 6.47k forks source link

Add control plane node failed after one first intialization #11253

Open lenglet-k opened 5 months ago

lenglet-k commented 5 months ago

What happened?

I have run cluster.yaml with only one control plane. This control plane it's works. Few moments later, i add two control plane in my inventory and i run cluster.yaml playbook. The add task of my two new control planes failed on this tasks:

TASK [kubernetes_sigs.kubespray.kubernetes/control-plane : Copy discovery kubeconfig] *******************************************************************************************************************************
skipping: [host-1.node.b]
fatal: [host-2.node.b]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '....collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml': line 75, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Copy discovery kubeconfig\n  ^ here\n"}
fatal: [host-3.node.b]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '....collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml': line 75, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Copy discovery kubeconfig\n  ^ here\n"}

And before i see that this task is skipped, but it's this task which register kubeconfig_file_discovery variable :

TASK [kubernetes_sigs.kubespray.kubernetes/control-plane : Get kubeconfig for join discovery process] ***************************************************************************************************************
skipping: [host-1.node.b]

I don't understand for why this task is skipped, because kubeadm_use_file_discovery is defined to true. Maybe it's caused by this when condition: kubeadm_already_run is not defined or not kubeadm_already_run.stat.exists

What did you expect to happen?

My two new control plane must be installed

How can we reproduce it (as minimally and precisely as possible)?

First: Init cluster with one control plane After: add two node and run cluster.yml

OS

Rocky 8.9

Version of Ansible

ansible [core 2.16.7]
  config file = user/kubernetes/ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/miniconda3/lib/python3.11/site-packages/ansible
  ansible collection location = /user/kubernetes/collections
  executable location = /home/user/miniconda3/bin/ansible
  python version = 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] (/home/user/miniconda3/bin/python)
  jinja version = 3.1.4
  libyaml = True

Version of Python

Python 3.11.5

Version of Kubespray (commit)

743bcea

Network plugin used

calico

Full inventory with variables

host-1.node.b | SUCCESS => {
    "hostvars[inventory_hostname]": {
        "ansible_check_mode": false,
        "ansible_config_file": "path/kubernetes/ansible.cfg",
        "ansible_diff_mode": false,
        "ansible_facts": {},
        "ansible_forks": 5,
        "ansible_host": "192.168.250.4",
        "ansible_inventory_sources": [
            "path/kubernetes/recette/hosts.ini"
        ],
        "ansible_playbook_python": "/home/user/miniconda3/bin/python",
        "ansible_user": "user",
        "ansible_verbosity": 0,
        "ansible_version": {
            "full": "2.16.7",
            "major": 2,
            "minor": 16,
            "revision": 7,
            "string": "2.16.7"
        },
        "envId": "recette-1",
        "etcd_member_name": "etcd1",
        "group_names": [
            "etcd",
            "k8s_cluster",
            "kube_control_plane"
        ],
        "groups": {
            "all": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "calico_rr": [],
            "etcd": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "k8s_cluster": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "kube_control_plane": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "kube_node": [],
            "ungrouped": []
        },
        "inventory_dir": "path/kubernetes/recette",
        "inventory_file": "path/kubernetes/recette/hosts.ini",
        "inventory_hostname": "host-1.node.b",
        "inventory_hostname_short": "host-1",
        "ip": "192.168.250.4",
        "omit": "__omit_place_holder__15fc867b19dafaab1ff0c37309be21c76fa8c8e5",
        "playbook_dir": "path/kubernetes",
        "projectId": "b-control-plane"
    }
}
host-2.node.b | SUCCESS => {
    "hostvars[inventory_hostname]": {
        "ansible_check_mode": false,
        "ansible_config_file": "path/kubernetes/ansible.cfg",
        "ansible_diff_mode": false,
        "ansible_facts": {},
        "ansible_forks": 5,
        "ansible_host": "192.168.250.5",
        "ansible_inventory_sources": [
            "path/kubernetes/recette/hosts.ini"
        ],
        "ansible_playbook_python": "/home/user/miniconda3/bin/python",
        "ansible_user": "user",
        "ansible_verbosity": 0,
        "ansible_version": {
            "full": "2.16.7",
            "major": 2,
            "minor": 16,
            "revision": 7,
            "string": "2.16.7"
        },
        "envId": "recette-2",
        "etcd_member_name": "etcd2",
        "group_names": [
            "etcd",
            "k8s_cluster",
            "kube_control_plane"
        ],
        "groups": {
            "all": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "calico_rr": [],
            "etcd": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "k8s_cluster": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "kube_control_plane": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "kube_node": [],
            "ungrouped": []
        },
        "inventory_dir": "path/kubernetes/recette",
        "inventory_file": "path/kubernetes/recette/hosts.ini",
        "inventory_hostname": "host-2.node.b",
        "inventory_hostname_short": "host-2",
        "ip": "192.168.250.5",
        "omit": "__omit_place_holder__15fc867b19dafaab1ff0c37309be21c76fa8c8e5",
        "playbook_dir": "path/kubernetes",
        "projectId": "b-control-plane"
    }
}
host-3.node.b | SUCCESS => {
    "hostvars[inventory_hostname]": {
        "ansible_check_mode": false,
        "ansible_config_file": "path/kubernetes/ansible.cfg",
        "ansible_diff_mode": false,
        "ansible_facts": {},
        "ansible_forks": 5,
        "ansible_host": "192.168.250.6",
        "ansible_inventory_sources": [
            "path/kubernetes/recette/hosts.ini"
        ],
        "ansible_playbook_python": "/home/user/miniconda3/bin/python",
        "ansible_user": "user",
        "ansible_verbosity": 0,
        "ansible_version": {
            "full": "2.16.7",
            "major": 2,
            "minor": 16,
            "revision": 7,
            "string": "2.16.7"
        },
        "envId": "recette-3",
        "etcd_member_name": "etcd3",
        "group_names": [
            "etcd",
            "k8s_cluster",
            "kube_control_plane"
        ],
        "groups": {
            "all": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "calico_rr": [],
            "etcd": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "k8s_cluster": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "kube_control_plane": [
                "host-1.node.b",
                "host-2.node.b",
                "host-3.node.b"
            ],
            "kube_node": [],
            "ungrouped": []
        },
        "inventory_dir": "path/kubernetes/recette",
        "inventory_file": "path/kubernetes/recette/hosts.ini",
        "inventory_hostname": "host-3.node.b",
        "inventory_hostname_short": "host-3",
        "ip": "192.168.250.6",
        "omit": "__omit_place_holder__15fc867b19dafaab1ff0c37309be21c76fa8c8e5",
        "playbook_dir": "path/kubernetes",
        "projectId": "b-control-plane"
    }
}

Command used to invoke ansible

ansible-playbook -i recette/hosts.ini -e "@recette/control-plane/group_vars/hardening.yaml" -e "@./control-plane/group_vars/standard.yaml" --become --become-user=root ./control-plane/install_control_plane.yaml

Output of ansible run

TASK [kubernetes_sigs.kubespray.kubernetes/control-plane : Copy discovery kubeconfig] *******************************************************************************************************************************
skipping: [host-1.node.b]
fatal: [host-2.node.b]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '....collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml': line 75, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Copy discovery kubeconfig\n  ^ here\n"}
fatal: [host-3.node.b]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '....collections/ansible_collections/kubernetes_sigs/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml': line 75, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Copy discovery kubeconfig\n  ^ here\n"}

Anything else we need to know

I add my debug test:

TASK [kubernetes_sigs.kubespray.kubernetes/control-plane : debug kubeadm_use_file_discovery] ************************************************************************************************************************
ok: [host-1.node.b] => {
    "msg": true
}
ok: [host-2.node.b] => {
    "msg": true
}
ok: [host-3.node.b] => {
    "msg": true
}

TASK [kubernetes_sigs.kubespray.kubernetes/control-plane : debug kubeadm_already_run] *******************************************************************************************************************************
ok: [host-1.node.b] => {
    "msg": {
        "changed": false,
        "failed": false,
        "stat": {
            "atime": 1717102598.4374506,
            "block_size": 4096,
            "blocks": 8,
            "ctime": 1717102598.4374506,
            "dev": 64770,
            "device_type": 0,
            "executable": false,
            "exists": true,
            "gid": 0,
            "gr_name": "root",
            "inode": 10235231,
            "isblk": false,
            "ischr": false,
            "isdir": false,
            "isfifo": false,
            "isgid": false,
            "islnk": false,
            "isreg": true,
            "issock": false,
            "isuid": false,
            "mode": "0640",
            "mtime": 1717102598.4374506,
            "nlink": 1,
            "path": "/var/lib/kubelet/config.yaml",
            "pw_name": "root",
            "readable": true,
            "rgrp": true,
            "roth": false,
            "rusr": true,
            "size": 1077,
            "uid": 0,
            "wgrp": false,
            "woth": false,
            "writeable": true,
            "wusr": true,
            "xgrp": false,
            "xoth": false,
            "xusr": false
        }
    }
}
ok: [host-2.node.b] => {
    "msg": {
        "changed": false,
        "failed": false,
        "stat": {
            "exists": false
        }
    }
}
ok: [host-3.node.b] => {
    "msg": {
        "changed": false,
        "failed": false,
        "stat": {
            "exists": false
        }
    }
}
samfili commented 5 months ago

+1 same issue

lenglet-k commented 5 months ago

ask in slack: https://kubernetes.slack.com/archives/C2V9WJSJD/p1717403344596369

Seljuke commented 3 months ago

I had the same problem just removed this line and it is working now

k8s-triage-robot commented 1 day ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale