Closed bsodmike closed 1 year ago
Dug a bit deeper and the issue is elsewhere, this is on one of the master nodes:
Feb 16 14:40:11 k3s-3.debian11.homelab.com python3[22103]: ansible-ansible.legacy.command Invoked with _raw_params=k3s kubectl get nodes -l "node-role.kubernetes.io/master=true" -o=jsonpath="{.items[*].metadata.name}" _uses_shell=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Feb 16 14:40:11 k3s-3.debian11.homelab.com k3s[22038]: time="2023-02-16T14:40:11+05:30" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://10.0.3.79:6443/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Failed with result 'exit-code'.
Hi can you please fill out the issue template that was supplied when you created an issue? Thank you!
According to the YouTube video, at least, your master nodes joined the main node which runs kube-vip
.
This does not happen, instead the 2nd and 3rd master nodes are unable to connect to the main (primary) master node as CA certs are missing.
Feb 16 14:40:11 k3s-3.debian11.homelab.com python3[22103]: ansible-ansible.legacy.command Invoked with _raw_params=k3s kubectl get nodes -l "node-role.kubernetes.io/master=true" -o=jsonpath="{.items[*].metadata.name}" _uses_shell=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Feb 16 14:40:11 k3s-3.debian11.homelab.com k3s[22038]: time="2023-02-16T14:40:11+05:30" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://10.0.3.79:6443/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 14:40:11 k3s-3.debian11.homelab.com systemd[1]: k3s-init.service: Failed with result 'exit-code'.
Run the playbook by default, this error should take place.
Operating system: Debian 11
Hardware: VM: 16GB RAM / 2vcpu / 40GB disk
all.yml
k3s_version: v1.24.10+k3s1
ansible_user: NA
systemd_dir: /etc/systemd/system
# interface which will be used for flannel
flannel_iface: "eth0"
# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "10.0.3.85"
k3s_token: "NA"
# these arguments are recommended for servers as well as agents:
extra_args: >-
--flannel-iface={{ flannel_iface }}
--node-ip={{ k3s_node_ip }}
# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik
extra_agent_args: >-
{{ extra_args }}
# image tag for kube-vip
kube_vip_tag_version: "v0.5.7"
# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.7"
metal_lb_controller_tag_version: "v0.13.7"
# metallb ip range for load balancer
metal_lb_ip_range: "10.0.3.90-10.0.3.100"
host.ini
[master]
10.0.3.79
10.0.3.80
10.0.3.81
[node]
10.0.3.82
10.0.3.83
# only required if proxmox_lxc_configure: true
# must contain all proxmox instances that have a master or worker node
# [proxmox]
# 192.168.30.43
[k3s_cluster:children]
master
node
I was planning on setting up self-signed certs and seeing if that would work, but I'm just confused as why this wasn't experienced when you made the Video :). Thanks Tim!
FYI, I also noticed another error and fixed this by running /usr/local/bin/k3s kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
- without this, there were metallb errors in the logs.
If they do not match, create one master / server node and add additional servers outside of this playbook
Removing the 2nd/3rd master and trying this now. This passed the initial failure point
TASK [k3s/master : Verify that all nodes actually joined (check k3s-init.service if this fails)] ***
FAILED - RETRYING: [10.0.3.79]: Verify that all nodes actually joined (check k3s-init.service if this fails) (20 retries left).
ok: [10.0.3.79]
However it is now failing at
TASK [k3s/node : Copy K3s service file] **************************************************
changed: [10.0.3.83]
changed: [10.0.3.82]
TASK [k3s/node : Enable and check K3s service] *******************************************
I find it strange that it is trying to fetch the CA cert (which doesn't exist anyway, as far as I'm aware), from the localhost address - ideas?
Feb 17 11:27:25 k3s-4.debian11.homelab.com k3s[29019]: time="2023-02-17T11:27:25+05:30" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 17 11:27:27 k3s-4.debian11.homelab.com systemd[1]: Configuration file /etc/systemd/system/k3s-node.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Feb 17 11:27:47 k3s-4.debian11.homelab.com k3s[29019]: time="2023-02-17T11:27:47+05:30" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Feb 17 11:28:09 k3s-4.debian11.homelab.com k3s[29019]: time="2023-02-17T11:28:09+05:30" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
In my case had the same fail point, steps that helped me: make sure each host has a unique hostname, make sure that hosts do not have any firewall rules blocking traffic(on all ports)
Thanks @BornaV let me double check on the local firewall.
Hi all,
I'm testing a very basic clone of this playbook, with a few basics changed. The error I'm seeing is this. It seems the Jinja templating is breaking at
{.items[*].metadata.name}
which is here https://github.com/techno-tim/k3s-ansible/blob/master/roles/k3s/master/tasks/main.yml#L34I can confirm that the kube-vip instance is running and the script fails due to the issue above.