techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

Final Task hanging indefinitely #156

Closed SuperSweatyYeti closed 1 year ago

SuperSweatyYeti commented 1 year ago

The Issue

Everything Runs perfectly until I get to the task: TASK [k3s/node : Enable and check K3s service] And then the playbook hangs

Expected Behavior

The last task completes and the node joins the cluster

Current Behavior

That task hangs forever and the node never joins the cluster

Steps to Reproduce

  1. Run through all steps in the README
  2. changed my variables for my environment
  3. ran ansible-playbook site.yml -i inventory/my-cluster/hosts.ini --ask-become-pass

BRANCH Used: v1.24.6+k3s1

Context (variables)

Operating system: The OS used to run the playbook was : WSL on windows 10 OS of VMs: Ubuntu 22.04 Same behavior when using Ubuntu 20.04

Hardware: My ProxMox host: intel i5 8 core, 32GB ram, 500GB ssd

Variables Used

all.yml

---
k3s_version: v1.24.6+k3s1
# this is the user that has ssh access to these machines
ansible_user: duster
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "America/Los_Angeles"

# interface which will be used for flannel
flannel_iface: "eth0"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.30.222"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "some-SUPER-DEDEUPER-secret-password"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.5"

# image tag for metal lb
metal_lb_speaker_tag_version: "v0.13.6"
metal_lb_controller_tag_version: "v0.13.6"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.30.80-192.168.30.90"

Hosts

host.ini

[master] 192.168.22.233

[node] 192.168.22.234

[k3s_cluster:children] master node

Possible Solution

here are the logs from the service on the node that the ansible task is trying to modify:

duster@port-node:~$ sudo systemctl status k3s-node.service
● k3s-node.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: enabled)
     Active: activating (start) since Mon 2022-11-07 10:47:23 PST; 28min ago
       Docs: https://k3s.io
    Process: 1624 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 1640 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 1641 (k3s-agent)
      Tasks: 8
     Memory: 227.7M
     CGroup: /system.slice/k3s-node.service
             └─1641 /usr/local/bin/k3s agent

Nov 07 11:12:42 port-node k3s[1641]: time="2022-11-07T11:12:42-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:13:04 port-node k3s[1641]: time="2022-11-07T11:13:04-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:13:26 port-node k3s[1641]: time="2022-11-07T11:13:26-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:13:48 port-node k3s[1641]: time="2022-11-07T11:13:48-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:14:10 port-node k3s[1641]: time="2022-11-07T11:14:10-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:14:32 port-node k3s[1641]: time="2022-11-07T11:14:32-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:14:54 port-node k3s[1641]: time="2022-11-07T11:14:54-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:15:16 port-node k3s[1641]: time="2022-11-07T11:15:16-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:15:38 port-node k3s[1641]: time="2022-11-07T11:15:38-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Nov 07 11:16:00 port-node k3s[1641]: time="2022-11-07T11:16:00-08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

This is what the systemd service file looks like on the node:

duster@port-node:~$ cat /etc/systemd/system/k3s-node.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=notify
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s agent --server https://192.168.30.222:6443 --token K10bf2e555f666a4f7ad106cb0c10127ae07adebb82b8f25ced0be3102605c92af4::server:902e8099adeb5f4ef5d373996cf740c6 --flannel-iface=eth0 --node-ip=192.168.22.234
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
SuperSweatyYeti commented 1 year ago

Figured it out Guys.

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.30.222"
...
# metallb ip range for load balancer
metal_lb_ip_range: "192.168.30.80-192.168.30.90"

Neded to change the apiserver_endpoint and metal_lb_ip_range to some ips that actually exist in my local network.

Anyways,

Thanks again for this tool guys. Much appreciated 😄