The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
When executing the k3s/post role i get a failure when applying the IPAddressPool and L2Advertisement configs to K3s. Has anyone else ran into this issue? I have found these issues that seem related but none of the workarounds are working.
The config should be accepted and applied to the cluster with the appropriate address pool and L2 Advertisement config.
Current Behavior
TASK [k3s/post : Apply metallb CRs] *****************************************************************************************************
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (5 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (4 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (3 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (2 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (1 retries left).
fatal: [k3s-cs-master2]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["k3s", "kubectl", "apply", "-f", "/tmp/k3s/metallb-crs.yaml", "--timeout=120s"], "delta": "0:00:20.332449", "end": "2023-03-05 12:21:26.620248", "msg": "non-zero return code", "rc": 1, "start": "2023-03-05 12:21:06.287799", "stderr": "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": context deadline exceeded\nError from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\": context deadline exceeded", "stderr_lines": ["Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": context deadline exceeded", "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\": context deadline exceeded"], "stdout": "", "stdout_lines": []}
Steps to Reproduce
Execute ansible
Context (variables)
Operating system: Ubuntu 22.04 LTS
Hardware: VMware
Variables Used
all.yml
---
k3s_version: v1.25.6+k3s1 # also tried v1.24.10+k3s1
# this is the user that has ssh access to these machines
ansible_user: ansible
systemd_dir: /etc/systemd/system
# Set your timezone
system_timezone: "US/Pacific"
# interface which will be used for flannel
flannel_iface: "ens192"
# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "10.1.0.10"
# k3s_token is required so masters can talk together securely.
# This token should be alpha numeric only
k3s_token: "<redacted>"
# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'
# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"
# these arguments are recommended for servers as well as agents:
extra_args: >-
--flannel-iface={{ flannel_iface }}
--node-ip={{ k3s_node_ip }}
# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik
extra_agent_args: >-
{{ extra_args }}
# image tag for kube-vip
kube_vip_tag_version: "v0.5.7"
# metallb type frr or native
metal_lb_type: "native"
# metallb mode layer2 or bgp
metal_lb_mode: "layer2"
# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.9" # have tried v0.13.7 as well
metal_lb_controller_tag_version: "v0.13.9" # have tried v0.13.7 as well
# metallb ip range for load balancer
metal_lb_ip_range: "10.1.3.1-10.1.3.254"
Hosts
I'm using dynamic inventory with vmware component. Output of ansible-inventory:
When executing the k3s/post role i get a failure when applying the IPAddressPool and L2Advertisement configs to K3s. Has anyone else ran into this issue? I have found these issues that seem related but none of the workarounds are working.
Here is the output from manually running the commands from a master node:
And here is the contents of the config file created by Ansible:
Expected Behavior
The config should be accepted and applied to the cluster with the appropriate address pool and L2 Advertisement config.
Current Behavior
Steps to Reproduce
Context (variables)
Operating system: Ubuntu 22.04 LTS
Hardware: VMware
Variables Used
all.yml
Hosts
I'm using dynamic inventory with vmware component. Output of ansible-inventory:
Possible Solution