techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

k3s.node.service failure on fresh install #289

Closed GHMCr closed 1 year ago

GHMCr commented 1 year ago

After installing a fresh Ubuntu server 2204.2 LTS install on a Minisforum NUC wanted to install k8s with the k3s-ansible scripts. Created a my-cluster dir with the required changes in hosts.ini and all.yaml for 1 host as a test. If succesfull later to be expanded with more NUC's etc.

Script runs fine untill:

TASK [k3s/node : Enable and check K3s service] *** fatal: [node001.calmus.one]: FAILED! => {"changed": false, "msg": "Unable to start service k3s-node: Job for k3s-node.service failed because the control process exited with error code.\nSee \"systemctl status k3s-node.service\" and \"journalctl -xeu k3s-node.service\" for details.\n"}

Expected Behavior

Everything is happily and errofree installed and runing with a first very small K8S cluster on 1 node for starters.

Current Behavior

TASK [k3s/node : Enable and check K3s service] *** fatal: [node001.calmus.one]: FAILED! => {"changed": false, "msg": "Unable to start service k3s-node: Job for k3s-node.service failed because the control process exited with error code.\nSee \"systemctl status k3s-node.service\" and \"journalctl -xeu k3s-node.service\" for details.\n"}

systemctl status k3s-node.service

k3s-node.service - Lightweight Kubernetes Loaded: loaded (/etc/systemd/system/k3s-node.service; enabled; vendor preset: enabled) Active: activating (auto-restart) (Result: exit-code) since Wed 2023-04-26 14:52:01 CEST; 2s ago Docs: https://k3s.io Process: 17487 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS) Process: 17488 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS) Process: 17489 ExecStart=/usr/local/bin/k3s agent --server https://192.168.1.202:6443 --token K1046a0af5a2042c1bc9806d134cb6735dd0120b79f397e5a0> Main PID: 17489 (code=exited, status=1/FAILURE) CPU: 154ms

journalctl -xeu k3s-node.service ░░ ░░ A stop job for unit k3s-node.service has finished. ░░ ░░ The job identifier is 4869 and the job result is done. Apr 26 14:52:48 node001 systemd[1]: Starting Lightweight Kubernetes... ░░ Subject: A start job for unit k3s-node.service has begun execution ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ A start job for unit k3s-node.service has begun execution. ░░ ░░ The job identifier is 4869. Apr 26 14:52:48 node001 k3s[17715]: time="2023-04-26T14:52:48+02:00" level=info msg="Starting k3s agent v1.24.12+k3s1 (57e8adb5)" Apr 26 14:52:48 node001 k3s[17715]: time="2023-04-26T14:52:48+02:00" level=warning msg="Error starting load balancer: listen tcp 127.0.0.1:6444: bin> Apr 26 14:52:48 node001 k3s[17715]: time="2023-04-26T14:52:48+02:00" level=fatal msg="listen tcp 127.0.0.1:6444: bind: address already in use" Apr 26 14:52:48 node001 systemd[1]: k3s-node.service: Main process exited, code=exited, status=1/FAILURE ░░ Subject: Unit process exited ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ An ExecStart= process belonging to unit k3s-node.service has exited. ░░ ░░ The process' exit code is 'exited' and its exit status is 1. Apr 26 14:52:48 node001 systemd[1]: k3s-node.service: Failed with result 'exit-code'. ░░ Subject: Unit failed ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ The unit k3s-node.service has entered the 'failed' state with result 'exit-code'. Apr 26 14:52:48 node001 systemd[1]: Failed to start Lightweight Kubernetes. ░░ Subject: A start job for unit k3s-node.service has failed ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ A start job for unit k3s-node.service has finished with a failure. ░░ ░░ The job identifier is 4869 and the job result is failed.

Steps to Reproduce

See above

Context (variables)

Operating system: Ubuntu server 2204.2 LTS

Hardware: Minisforum NUC (Ryzen 5)

Variables Used

all.yml

k3s_version: "v1.24.12+k3s1"
ansible_user: NA
systemd_dir: "/etc/systemd/system"

flannel_iface: "bond0"

apiserver_endpoint: "192.168.1.202"

k3s_token: "NA"

extra_server_args: "--flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}"
extra_agent_args: "{{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
  --write-kubeconfig-mode 644"

kube_vip_tag_version: "v0.5.12"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# bgp options
# metal_lb_bgp_my_asn: "64513"
# metal_lb_bgp_peer_asn: "64512"
# metal_lb_bgp_peer_address: "192.168.30.1"

# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.9"
metal_lb_controller_tag_version: "v0.13.9"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.1.220-192.168.1.240"

Hosts

host.ini

[master]
node001.calmus.one
# node002.calmus.one
# node003.calmus.one

[node]
node001.calmus.one
# node002.calmus.one
# node003.calmus.one

# only required if proxmox_lxc_configure: true
# must contain all proxmox instances that have a master or worker node
# [proxmox]
# 192.168.30.43

[k3s_cluster:children]
master
node

Possible Solution

GHMCr commented 1 year ago

When running the k3s script from from k3s.io on the CLI server itself everything runs and install fine.