techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

Metallb CRs apply failure #245

Closed palmertime closed 1 year ago

palmertime commented 1 year ago

When executing the k3s/post role i get a failure when applying the IPAddressPool and L2Advertisement configs to K3s. Has anyone else ran into this issue? I have found these issues that seem related but none of the workarounds are working.

Here is the output from manually running the commands from a master node:

/tmp/k3s$ sudo k3s kubectl apply -f metallb-crs.yaml
Error from server (InternalError): error when creating "metallb-crs.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": context deadline exceeded
Error from server (InternalError): error when creating "metallb-crs.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded

And here is the contents of the config file created by Ansible:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 10.1.3.1-10.1.3.254

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system

Expected Behavior

The config should be accepted and applied to the cluster with the appropriate address pool and L2 Advertisement config.

Current Behavior

TASK [k3s/post : Apply metallb CRs] *****************************************************************************************************
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (5 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (4 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (3 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (2 retries left).
FAILED - RETRYING: [k3s-cs-master2]: Apply metallb CRs (1 retries left).
fatal: [k3s-cs-master2]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["k3s", "kubectl", "apply", "-f", "/tmp/k3s/metallb-crs.yaml", "--timeout=120s"], "delta": "0:00:20.332449", "end": "2023-03-05 12:21:26.620248", "msg": "non-zero return code", "rc": 1, "start": "2023-03-05 12:21:06.287799", "stderr": "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": context deadline exceeded\nError from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\": context deadline exceeded", "stderr_lines": ["Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": context deadline exceeded", "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\": context deadline exceeded"], "stdout": "", "stdout_lines": []}

Steps to Reproduce

  1. Execute ansible

Context (variables)

Operating system: Ubuntu 22.04 LTS

Hardware: VMware

Variables Used

all.yml

---
k3s_version: v1.25.6+k3s1 # also tried v1.24.10+k3s1

# this is the user that has ssh access to these machines
ansible_user: ansible

systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "US/Pacific"

# interface which will be used for flannel
flannel_iface: "ens192"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "10.1.0.10"

# k3s_token is required so masters can talk together securely.
# This token should be alpha numeric only
k3s_token: "<redacted>"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.7"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.9" # have tried v0.13.7 as well
metal_lb_controller_tag_version: "v0.13.9" # have tried v0.13.7 as well

# metallb ip range for load balancer
metal_lb_ip_range: "10.1.3.1-10.1.3.254"

Hosts

I'm using dynamic inventory with vmware component. Output of ansible-inventory:

{
    "_meta": {
        "hostvars": {<redacted>},
    },
    "all": {
        "children": [
            "k3s_cluster",
            "k3s_master",
            "k3s_node",
            "master",
            "node",
            "ungrouped"
        ]
    },
    "k3s_cluster": {
        "hosts": [
            "k3s-cs-master1",
            "k3s-cs-master2",
            "k3s-cs-master3",
            "k3s-cs-node1",
            "k3s-cs-node2",
            "k3s-cs-node3"
        ]
    },
    "k3s_master": {
        "hosts": [
            "k3s-cs-master1",
            "k3s-cs-master2",
            "k3s-cs-master3"
        ]
    },
    "k3s_node": {
        "hosts": [
            "k3s-cs-node1",
            "k3s-cs-node2",
            "k3s-cs-node3"
        ]
    },
    "master": {
        "hosts": [
            "k3s-cs-master1",
            "k3s-cs-master2",
            "k3s-cs-master3"
        ]
    },
    "node": {
        "hosts": [
            "k3s-cs-node1",
            "k3s-cs-node2",
            "k3s-cs-node3"
        ]
    }
}

Possible Solution