usr/local/bin/nerdctl: not found when running kubespray with vagrant

romch007 commented 1 year ago

I am trying to install kubespray using the provided Vagrantfile. The only changes I made were :

 $num_instances ||= 3
 $instance_name_prefix ||= "k8s"
 $vm_gui ||= false
-$vm_memory ||= 2048
-$vm_cpus ||= 2
+$vm_memory ||= 4096
+$vm_cpus ||= 3
 $shared_folders ||= {}
 $forwarded_ports ||= {}
-$subnet ||= "172.18.8"
+$subnet ||= "192.168.56"
 $subnet_ipv6 ||= "fd3c:b398:0698:0756"
 $os ||= "ubuntu2004"
 $network_plugin ||= "flannel"
@@ -254,6 +254,7 @@ Vagrant.configure("2") do |config|
       # And limit the action to gathering facts, the full playbook is going to be ran by testcases_run.sh
       if i == $num_instances
         node.vm.provision "ansible" do |ansible|
+          ansible.compatibility_mode = "2.0"
           ansible.playbook = $playbook
           ansible.verbose = $ansible_verbosity
           $ansible_inventory_path = File.join( $inventory, "hosts.ini")

All the other files of the repo are unchanged.

Environment:

Cloud provider or hardware configuration: VirtualBox 7.0.8

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 6.3.9-arch1-1 x86_64
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo

Version of Ansible (ansible --version):

ansible [core 2.15.1]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/romain/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.11/site-packages/ansible
ansible collection location = /home/romain/.ansible/collections:/usr/share/ansible/collections
executable location = /bin/ansible
python version = 3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429] (/usr/bin/python)
jinja version = 3.1.2
libyaml = True

Version of Python (python --version): Python 3.11.3

Kubespray version (commit) (git rev-parse --short HEAD): b42757d33

Network plugin used: flannel

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Command used to invoke ansible: vagrant up

Output of ansible run:

On every node:

TASK [download : download_container | Load image into the local container registry]
fatal: [k8s-1]: FAILED! => {"changed": true, "cmd": "/usr/local/bin/nerdctl -n k8s.io image load < /tmp/releases/images/docker.io_flannel_flannel_v0.22.0.tar", "delta": "0:00:00.004128", "end": "2023-06-30 16:34:15.315823", "failed_when_result": true, "msg": "non-zero return code", "rc": 127, "start": "2023-06-30 16:34:15.311695", "stderr": "/bin/sh: 1: /usr/local/bin/nerdctl: not found", "stderr_lines": ["/bin/sh: 1: /usr/local/bin/nerdctl: not found"], "stdout": "", "stdout_lines": []}
fatal: [k8s-2]: same
fatal: [k8s-3]: same

Anything else do we need to know:

wolskies commented 1 year ago

I'm seeing the same behavior with Debian 12 VMs hosted by proxmox. Kubespray downloads nerdctl (among others) correctly to /tmp/releases then doesn't copy to /usr/local/bin - skips right to trying to pull the flannel image with nertctl and gets a [Errno 2] No such file or directory: b'/usr/local/bin/nerdctl

The full traceback is:
  File "/tmp/ansible_ansible.legacy.command_payload_uw5dw5yf/ansible_ansible.legacy.command_payload.zip/ansible/module_utils/basic.py", line 2030, in run_command
    cmd = subprocess.Popen(args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 1024, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.11/subprocess.py", line 1901, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
fatal: [node1]: FAILED! => {
    "attempts": 4,
    "changed": false,
    "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet docker.io/flannel/flannel:v0.22.0",
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/nerdctl -n k8s.io pull --quiet  docker.io/flannel/flannel:v0.22.0",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true
        }
    },
    "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl'",
    "rc": 2,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []

mickaelmonsieur commented 1 year ago

Same errors with Debian 11.7 , Vagrant 2.3.7 and Virtualbox 7.0.8.

mickaelmonsieur commented 1 year ago

Small fix:

cp /tmp/releases/nerdctl /usr/local/bin/nerdctl && cp /tmp/releases/crictl /usr/local/bin/crictl

and relaunch ansible.

yankay commented 1 year ago

Thanks @romch007 @mickaelmonsieur

If glad to, feel free to provide a PR. :-)

Thank you very much.

wolskies commented 1 year ago

I did that, it gets past the immediate problem with nerdctl not being in /usr/local/bin, but fails later trying to create the kubeadm token (on all nodes) - I think it's related (seems like nerdctl, crictl and runc get downloaded but not configured):

TASK [kubernetes/control-plane : Create kubeadm token for joining nodes with 24h expiration (default)] ****************************************** task path: /Users/ed/Kube/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-setup.yml:207 fatal: [node2 -> node1(192.168.1.73)]: FAILED! => { "attempts": 5, "changed": false, "cmd": [ "/usr/local/bin/kubeadm", "--kubeconfig", "/etc/kubernetes/admin.conf", "token", "create" ], "delta": "0:01:15.109430", "end": "2023-07-04 02:54:04.922118", "invocation": { "module_args": { "_raw_params": "/usr/local/bin/kubeadm --kubeconfig /etc/kubernetes/admin.conf token create", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true } }, "msg": "non-zero return code", "rc": 1, "start": "2023-07-04 02:52:49.812688", "stderr": "timed out waiting for the condition\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": [ "timed out waiting for the condition", "To see the stack trace of this error execute with --v=5 or higher" ], "stdout": "", "stdout_lines": []

journalctl shows something wrong with the configuration of runc:

`sudo journalctl -xeu kubelet | grep failed Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524302 127545 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown" Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524363 127545 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown" pod="kube-system/kube-apiserver-node1" Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524386 127545 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown" pod="kube-system/kube-apiserver-node1" Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524432 127545 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-apiserver-node1_kube-system(c4b89dde2a5c1b5d448fe0f03d05baa8)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\"kube-apiserver-node1_kube-system(c4b89dde2a5c1b5d448fe0f03d05baa8)\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: \\"runc\\": executable file not found in $PATH: unknown\"" pod="kube-system/kube-apiserver-node1" podUID=c4b89dde2a5c1b5d448fe0f03d05baa8 Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.619838 127545 controller.go:146] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.1.73:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: W0704 16:24:52.667772 127545 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list v1.RuntimeClass: Get "https://192.168.1.73:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: E0704 16:24:52.667836 127545 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch v1.RuntimeClass: failed to list v1.RuntimeClass: Get "https://192.168.1.73:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: W0704 16:24:52.675521 127545 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list v1.CSIDriver: Get "https://192.168.1.73:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: E0704 16:24:52.675585 127545 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch v1.CSIDriver: failed to list v1.CSIDriver: Get "https://192.168.1.73:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520427 127545 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown" Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520464 127545 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown" pod="kube-system/kube-controller-manager-node1" Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520486 127545 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown" pod="kube-system/kube-controller-manager-node1" Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520525 127545 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node1_kube-system(84983840101f64a28c6328ab55dc5c58)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\"kube-controller-manager-node1_kube-system(84983840101f64a28c6328ab55dc5c58)\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: \\"runc\\": executable file not found in $PATH: unknown\"" pod="kube-system/kube-controller-manager-node1" podUID=84983840101f64a28c6328ab55dc5c58 Jul 04 16:24:57 node1 kubelet[127545]: E0704 16:24:57.620471 127545 controller.go:146] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.1.73:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s": dial tcp 192.168.1.73:6443: connect: connection refused

My guess is it's related to the missing nerdctl thing - playbook seems to skip over configuration for nerdctl, crictl and possibly runc

blackmesa-peterdohm commented 1 year ago

Just to be clear, this is a total show-stopper for any use many (i'd argue most) people will have using on-premise kubespray at present. I'm trying to use this to build a cluster with calico on ubuntu; quite vanilla, really... How are no regression tests covering this? I've spent hours trying to figure out how those steps are being "skipped" and from what i can tell, it's not that they're skipped, it's that the configurations happen well later...

blackmesa-peterdohm commented 1 year ago

Just to be clear, this is a total show-stopper for any use many (i'd argue most) people will have using on-premise kubespray at present. I'm trying to use this to build a cluster with calico on ubuntu; quite vanilla, really... How are no regression tests covering this? I've spent hours trying to figure out how those steps are being "skipped" and from what i can tell, it's not that they're skipped, it's that the configurations happen well later...

FALSE ALARM. I'd run ansible outside the virtualenvironment. So, this is a very curious failure mode that occurs if you do what i just did, in case anyone else runs into this....

slappyslap commented 12 months ago

Got same error with master branch and debian 12

wolskies commented 11 months ago

From my perspective, it isn't a false alarm. I ran Ansible per the installation instructions, from inside the VENV and it continues to fail to configure nerdct/etc. I've tried with Debian 12 & Oracle/Rocky and get the same behavior - both on "bare metal" and VMs.

slappyslap commented 11 months ago

Same on debian 11 with master branch

Le dim. 23 juil. 2023 à 23:42, Ed W. @.***> a écrit :

From my perspective, it isn't a false alarm. I ran Ansible per the installation instructions, from inside the VENV and it continues to fail to configure nerdct/etc. I've tried with Debian 12 & Oracle/Rocky and get the same behavior - both on "bare metal" and VMs.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kubespray/issues/10268#issuecomment-1646966366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTZEPZPLLVQMXP52QC6FKLXRWLD7ANCNFSM6AAAAAAZ2CSX6M . You are receiving this because you commented.Message ID: @.***>

Khodesaeed commented 11 months ago

get the same error on ubuntu 20.04.

Mishavint commented 11 months ago

Faced similar problem with VBox Quick fix that helped in my case:

- name: Configure hosts
  gather_facts: False
  hosts: k8s_cluster
  tasks:
    - name: Create a symbolic link
      ansible.builtin.file:
        src: /tmp/releases/crictl
        dest: /usr/local/bin/crictl
        state: link
        force: true

    - name: Create a symbolic link
      ansible.builtin.file:
        src: /tmp/releases/nerdctl
        dest: /usr/local/bin/nerdctl
        state: link
        force: true

    - name: Create a symbolic link
      ansible.builtin.file:
        src: /tmp/releases/runc-v1.1.7.amd64
        dest: /usr/local/bin/runc
        state: link
        force: true

Just add this to playbooks/cluster.yml

Somehow, Kubespray doesn't copy nerdctl, crictl and runc to /usr/local/bin. So i just make a soft link

Khodesaeed commented 11 months ago

After some investigation, I guess somehow the dependency roles of container-engine which is containerd don't run after containerd CRI selects. According to the Ansible documentation about role dependencies (link): Role dependencies let you automatically pull in other roles when using a role.
And the doc says: Ansible always executes roles listed in dependencies before the role that lists them.
Moreover, you can find the containerd or any other CRI-related role dependencies on this path: roles/container-engine/meta/main.yml
which is the code snippet below related to containerd:

---
dependencies:
...
  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd

Following the same pattern, this role has some role dependencies and at this point the runc, crictl, and, nerdctl related tasks must run, but they didn't: the role-dependency meta file on path roles/container-engine/containerd/meta/main.yml:

---
dependencies:
  - role: container-engine/containerd-common
  - role: container-engine/runc
  - role: container-engine/crictl
  - role: container-engine/nerdctl

So, here is my Quick fix:
I added the required task for installing containerd on role-dependency meta file before the containerd section on this path roles/container-engine/meta/main.yml:

---
dependencies:
...
  - role: container-engine/runc
    when:
      - container_manager == 'containerd'

  - role: container-engine/nerdctl
    when:
      - container_manager == 'containerd'

  - role: container-engine/crictl
    when:
      - container_manager == 'containerd'

  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd

P.S After some more investigation I found another Bug that I think that's my main issue and it was that after using the reset.yml playbook to reset the cluster some container process still remains and after killing those containers finally accomplished to deploy my cluster with Kubespray.

yankay commented 11 months ago

After some investigation, I guess somehow the dependency roles of container-engine which is containerd don't run after containerd CRI selects. According to the Ansible documentation about role dependencies (link): Role dependencies let you automatically pull in other roles when using a role. And the doc says: Ansible always executes roles listed in dependencies before the role that lists them. Moreover, you can find the containerd or any other CRI-related role dependencies on this path: roles/container-engine/meta/main.yml which is the code snippet below related to containerd:
---
dependencies:
...
  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd
Following the same pattern, this role has some role dependencies and at this point the runc, crictl, and, nerdctl related tasks must run, but they didn't: the role-dependency meta file on path roles/container-engine/containerd/meta/main.yml:
---
dependencies:
  - role: container-engine/containerd-common
  - role: container-engine/runc
  - role: container-engine/crictl
  - role: container-engine/nerdctl
So, here is my Quick fix: I added the required task for installing containerd on role-dependency meta file before the containerd section on this path roles/container-engine/meta/main.yml:
---
dependencies:
...
  - role: container-engine/runc
    when:
      - container_manager == 'containerd'

  - role: container-engine/nerdctl
    when:
      - container_manager == 'containerd'

  - role: container-engine/crictl
    when:
      - container_manager == 'containerd'

  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd
P.S After some more investigation I found another Bug that I think that's my main issue and it was that after using the reset.yml playbook to reset the cluster some container process still remains and after killing those containers finally accomplished to deploy my cluster with Kubespray.

Thanks @Khodesaeed @roboticsbrian

I cannot find the root cause of the issue. Would you help us reproduce the issue?

Which config file, kubespray commit and OS are used, and is there any important step to reproduce the issue?

RomainMou commented 11 months ago

Hi,

After some investigation, it could be linked with how dependencies work. It's not uniform across all Ansible version when using when. These Ansible issues could be relevant:

https://github.com/ansible/ansible/issues/81486
https://github.com/ansible/ansible/issues/81040

this is normal and expected behavior for meta dependencies, de duplication is done on the 'call signature' of the role itself. If you want finer grained control I would recommend using include_role instead.

I've started replacing all dependencies by include_roles and import_roles to avoid this. I can do a PR if you think this is a right approch @yankay.

yankay commented 11 months ago

Hi,

After some investigation, it could be linked with how dependencies work. It's not uniform across all Ansible version when using when. These Ansible issues could be relevant:

Role dependencies: change in comportement when role run conditionally ansible/ansible#81486

does not load the meta dependencies ansible/ansible#81040

this is normal and expected behavior for meta dependencies, de duplication is done on the 'call signature' of the role itself. If you want finer grained control I would recommend using include_role instead.

I've started replacing all dependencies by include_roles and import_roles to avoid this. I can do a PR if you think this is a right approch @yankay.

Thanks @RomainMou

I do not know how to reproduce it, so have no idea about does it a right approch temporarily. :-) Does the issue would be reproduced in the ansible >= [core 2.15.x] ?

RomainMou commented 11 months ago

Yes @yankay, I've reproduced it on a new cluster installation with:

ansible==8.2.0
ansible-core==2.15.3

yankay commented 11 months ago

Thank you @RomainMou

I upgrade the ansible to

ansible==8.3.0

the issue is reproduced.

fatal: [kay171]: FAILED! => {"attempts": 4, "changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.m.daocloud.io/calico/node:v3.25.1", "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [kay172]: FAILED! => {"attempts": 4, "changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.m.daocloud.io/calico/node:v3.25.1", "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

I think supporting a new ansible version is very good for kubespray. It's very welcome to provide a PR to fix it.

@MrFreezeex @floryut, Would you please give some suggestions :-) Thanks.

bugaian commented 11 months ago

ansible -i inventory/mycluster/inventory.ini -u ubuntu --private-key=~/.ssh/id_rsa --become --become-user=root -b -m copy -a "src=/tmp/releases/nerdctl dest=/usr/local/bin/nerdctl mode=0755 remote_src=yes" all

ansible -i inventory/mycluster/inventory.ini -u ubuntu --private-key=~/.ssh/id_rsa --become --become-user=root -b -m copy -a "src=/tmp/releases/crictl dest=/usr/local/bin/crictl mode=0755 remote_src=yes" all

These lines fixed it on all nodes.

vyom-soft commented 8 months ago

Hi, After some investigation, it could be linked with how dependencies work. It's not uniform across all Ansible version when using when. These Ansible issues could be relevant:

Role dependencies: change in comportement when role run conditionally ansible/ansible#81486

does not load the meta dependencies ansible/ansible#81040

this is normal and expected behavior for meta dependencies, de duplication is done on the 'call signature' of the role itself. If you want finer grained control I would recommend using include_role instead.

I've started replacing all dependencies by include_roles and import_roles to avoid this. I can do a PR if you think this is a right approch @yankay.

Thanks @RomainMou

I do not know how to reproduce it, so have no idea about does it a right approch temporarily. :-) Does the issue would be reproduced in the ansible >= [core 2.15.x] ?

It is reproducible with Ansible 2.15.4. Today I hit this error.

The full traceback is:
  File "/tmp/ansible_ansible.legacy.command_payload_nox4f_k6/ansible_ansible.legacy.command_payload.zip/ansible/module_utils/basic.py", line 2038, in run_command
    cmd = subprocess.Popen(args, **kwargs)
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
fatal: [kvt0labvrpa0049]: FAILED! => {
    "attempts": 4,
    "changed": false,
    "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.io/calico/node:v3.26.3",
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.io/calico/node:v3.26.3",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true
        }
    },
    "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl': b'/usr/local/bin/nerdctl'",
    "rc": 2,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []
}

MrFreezeex commented 8 months ago

It is reproducible with Ansible 2.15.4. Today I hit this error.

Hi! not sure how you launched kubespray with ansible 2.15.4 but with definitely does not support this version! Please use requirements.txt to install your ansible version

VannTen commented 5 months ago

Still reproducible with latest master ?

VannTen commented 5 months ago

/triage not-reproducible I could not reproduce this on master (please provide a reproducer if that's incorret)

user81230 commented 1 month ago

May be connected to the issue: after clean installation on Oracle Linux 9, /usr/local/bin was simply not present in $PATH, so I couldn't use binaries (nerdctl included) from my user without specifying the full path to it. This was not affecting the installation process, though - everything worked as expected.

kubernetes-sigs / kubespray

usr/local/bin/nerdctl: not found when running kubespray with vagrant #10268