techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

Failure in k3s/post: Apply metalllb CRs Task #307

Closed manuelautomatisch closed 1 year ago

manuelautomatisch commented 1 year ago

When executing the ansible playbook the k3s post task "Apply metalllb CRs" fails after 5 retries. The following failure occurs in the playbook:

fatal: [192.168.0.60]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["k3s", "kubectl", "apply", "-f", "/tmp/k3s/metallb-crs.yaml", "--timeout=120s"], "delta": "0:00:20.251025", "end": "2023-05-20 11:13:36.014333", "msg": "non-zero return code", "rc": 1, "start": "2023-05-20 11:13:15.763308", "stderr": "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": context deadline exceeded\nError from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\": context deadline exceeded", "stderr_lines": ["Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": context deadline exceeded", "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\": context deadline exceeded"], "stdout": "", "stdout_lines": []}

When I try to apply the metallb IPAdressPool and the L2Advertisement on the first master (192.168.0.60) manually the following error occurs:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.0.240-192.168.0.250

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system

kubectl apply -f metallb-crs.yaml

Error from server (InternalError): error when creating "metallb-crs.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": context deadline exceeded Error from server (InternalError): error when creating "metallb-crs.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded

The same error is also referenced in #245, which ended up in using kube-vip only instead of the combination with metallb.

Expected Behavior

The ansible playbook should be able to apply the IPAdressPool and the L2Advertisement to properly run metallb in the cluster.

Current Behavior

The ansible playbooks fails at task k3s/post: Apply metallb CRs due to a webhook problem.

Steps to Reproduce

  1. execute the ansible playbook

Context (variables)

Operating system: Almalinux9 virtualized on proxmox 7.4

Variables Used

all.yml

---
k3s_version: v1.25.9+k3s1
# this is the user that has ssh access to these machines
ansible_user: <username>
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "Europe/Zurich"

# interface which will be used for flannel
flannel_iface: "ens18"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.0.222"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "<token>"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'
# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.12"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# bgp options
# metal_lb_bgp_my_asn: "64513"
# metal_lb_bgp_peer_asn: "64512"
# metal_lb_bgp_peer_address: "192.168.30.1"

# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.9"
metal_lb_controller_tag_version: "v0.13.9"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.0.240-192.168.0.250"

Hosts

host.ini

[master]
192.168.0.60
192.168.0.61
192.168.0.62

[node]
192.168.0.63
192.168.0.64

[k3s_cluster:children]
master
node

Possible Solution

A few users recommended to use kube-vip only as loadBalancer, but I would prefer to run it with metallb.

manuelautomatisch commented 1 year ago

Nevermind, I've changed my mind and switched the whole setup to kube-vip only according to #127.

Works pretty well, if someone else is facing this metallb issue I would recommend to switch to kube-vip only as well.