techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

metlb fails to deploy #42

Closed chazragg closed 2 years ago

chazragg commented 2 years ago

Expected Behavior

metallb is auto deployed via k3s

Current Behavior

metallb.configmap.yml is never deployed which causes the metallb pods to fail and k3s tries to redpeloy

Steps to Reproduce

  1. run ansible-playbook ./playbooks/site.yml -i ./inventory/tower-of-power/hosts.ini -K
  2. k3s is deployed and accessible through kube-vip (192.168.0.90)
  3. check deployment states of metallb and see a crashbackloop
  4. error states there is a missing configmap
  5. check configmaps for namespace and config is not present
  6. k3s terminates the namespace and deploys to retry the deployment

Context (variables)

Operating system: Raspberry pi OS (64-bit)

Hardware: a mixture between pi3 and pi4s (master node is a pi4 4GB)

Variables Used:

all.yml

k3s_version: "v1.24.2+k3s2"
ansible_user: NA
systemd_dir: "/etc/systemd/system"

flannel_iface: "wlan0"

apiserver_endpoint: "192.168.0.190"

k3s_token: "NA"

extra_server_args: "--no-deploy servicelb --no-deploy traefik --write-kubeconfig-mode 644 --kube-apiserver-arg default-not-ready-toleration-seconds=30 --kube-apiserver-arg default-unreachable-toleration-seconds=30 --kube-controller-arg node-monitor-period=20s --kube-controller-arg node-monitor-grace-period=20s --kubelet-arg node-status-update-frequency=5s"
extra_agent_args: "--kubelet-arg node-status-update-frequency=5s"

kube_vip_tag_version: "v0.4.4"

metal_lb_speaker_tag_version: "v0.12.1"
metal_lb_controller_tag_version: "v0.12.1"

metal_lb_ip_range: "192.168.0.180-192.168.0.189"

Hosts

host.ini

[master]
192.168.0.200

[node]
192.168.0.201
192.168.0.202
192.168.0.203
192.168.0.204

[k3s_cluster:children]
master
node

Some extra logs Here is the metllb-system controller pod after i get this error the whole namespace is terminated

{"branch":"HEAD","caller":"level.go:63","commit":"v0.12.1","goversion":"gc / go1.16.14 / arm64","level":"info","msg":"MetalLB controller starting version 0.12.1 (commit v0.12.1, branch HEAD)","ts":"2022-07-12T21:26:20.75386972Z","version":"0.12.1"}

{"caller":"level.go:63","level":"info","msg":"secret succesfully created","op":"CreateMlSecret","ts":"2022-07-12T21:26:20.95196698Z"}

{"caller":"level.go:63","event":"stateSynced","level":"info","msg":"controller synced, can allocate IPs now","ts":"2022-07-12T21:26:21.053274581Z"}

{"caller":"level.go:63","configmap":"metallb-system/config","event":"configLoaded","level":"info","msg":"config (re)loaded","ts":"2022-07-12T21:26:27.930190506Z"}

{"caller":"level.go:63","configmap":"metallb-system/config","error":"no MetalLB configuration in cluster","level":"error","msg":"configuration is missing, MetalLB will not function","op":"setConfig","ts":"2022-07-12T21:26:28.151674635Z"} 

Stream closed EOF for metallb-system/controller-7476b58756-kfp52 (controller) 

it also seems to cause some issues with kube-vip and I randomly lose connection through k9s and kubectl when the namespace is terminated via k3s

I will try to get more information but I am still a bit new to kubernetes so if you can suggest any way to debug that would be appreciated

mihkel commented 2 years ago

Hi! It somehow seems to be related to k3s versions. I started deploying using latest 1.24 k3s versions which all failed with the same reason. Then I moved down to v1.23.8+k3s1 which was still a no go. The only winning combination for me seems to be v1.23.4+k3s1.

chazragg commented 2 years ago

Thanks for the help. I downgraded to 1.23.4 and all is working fine. it is strange how k3s fails to deploy it considering I can manually apply the config map.

mattsn0w commented 2 years ago

I have the same issue. +1 for v1.23.4+k3s1 working though.

timothystewart6 commented 2 years ago

This should all be fixed in the latest version! please get latest and try again! Also, as a reminder, the latest tested version of everything will always be in all.yml anything other than that has not bee verified yet. Regardless, it's working!