The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
I am running proxmox with Ubuntu 24.04 VMs, my ansible instance is installed on one of the nodes that will have the server instance of k3s installed on it. It is also worth noting that all of my ansible commands are being run through semaphore. There is a user setup on all machines called semaphore with SSH keys and no password set up for all remote machines. I am also running the local machine as a remote machine and I made it pass itself an SSH key since I was told part of the issue might be the local host target (I am treating the local host as another remote machine). The semaphore user is also configured with the following line in the /etc/sudoers file: "semaphore ALL=(ALL) NOPASSWD:ALL" to give it sudo privileges with no password. Finally, the semaphore user is added to the sudo group.
I am receiving one error from two different playbooks. One from the installing K3s playbook "site.yml" and one from the reset playbook "reset.yml". Please note that all nodes 10.10.40.20 (localhost), 10.10.40.21, 10.10.40.22, 10.10.40.30, 10.10.40.31, 10.10.40.32
The Error from site.yml comes from main.yml under roles/prereq/tasks/. The error gives me the following:
fatal: [10.10.40.32]: FAILED! => {
"changed": false,
"module_stderr": "/bin/sh: sudo: not found\n",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 127
}
The Second Error is from roles/reset/tasks/unmount_with_children. This gives me the following error:
fatal: [localhost]: FAILED! => {"changed": false,
"changed_when_result": "The conditional check 'get_mounted_filesystems.stdout | length > 0' failed. The error was: error while evaluating conditional (get_mounted_filesystems.stdout | length > 0): 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'",
"module_stderr": "/bin/sh: sudo: not found\n",
"module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 127
}
These both give the rc 127 error, which from my understanding is a missing command under the /bin/sh path from ansible.
here are some outputs that I have run to troubleshoot.
which sh
returns: /usr/bin/sh
which sudo
returns: /usr/bin/sudo
It is also worth noting that both of these failures are using modules. (ansible.posix.sysctl & ansible.builtin.shell)
I have also gone through the General Troubleshooting
Semaphore wasn't using the correct user. Running the same setup but in the CLI works fine. I am removing Semaphore for now. I was planning on witching to something new anyways...
https://github.com/techno-tim/k3s-ansible/blob/fab302fd915fa8ef2ba8b56e1b7e41616a643260/roles/reset/tasks/umount_with_children.yml#L8
I am running proxmox with Ubuntu 24.04 VMs, my ansible instance is installed on one of the nodes that will have the server instance of k3s installed on it. It is also worth noting that all of my ansible commands are being run through semaphore. There is a user setup on all machines called semaphore with SSH keys and no password set up for all remote machines. I am also running the local machine as a remote machine and I made it pass itself an SSH key since I was told part of the issue might be the local host target (I am treating the local host as another remote machine). The semaphore user is also configured with the following line in the /etc/sudoers file: "semaphore ALL=(ALL) NOPASSWD:ALL" to give it sudo privileges with no password. Finally, the semaphore user is added to the sudo group.
I am receiving one error from two different playbooks. One from the installing K3s playbook "site.yml" and one from the reset playbook "reset.yml". Please note that all nodes 10.10.40.20 (localhost), 10.10.40.21, 10.10.40.22, 10.10.40.30, 10.10.40.31, 10.10.40.32
The Error from site.yml comes from main.yml under roles/prereq/tasks/. The error gives me the following:
The Second Error is from roles/reset/tasks/unmount_with_children. This gives me the following error:
These both give the rc 127 error, which from my understanding is a missing command under the /bin/sh path from ansible. here are some outputs that I have run to troubleshoot.
It is also worth noting that both of these failures are using modules. (ansible.posix.sysctl & ansible.builtin.shell) I have also gone through the General Troubleshooting