The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
"Wait for MetalLB resources" task fails if there is more then one replica set
Expected Behavior
"Wait for MetalLB resources" task finishes without errors
Current Behavior
"Wait for MetalLB resources" task fails
Steps to Reproduce
This is happening after update of metallb components
Install cluster with defaults
update variables metal_lb_speaker_tag_version and metal_lb_controller_tag_version to newer versions
run playbook normally
playbook fails on "Wait for MetalLB resources" task
Context (variables)
3 master cluster in proxmox vms on Ubuntu 22.04.3 LTS
Operating system: Ubuntu 22.04.3 LTS
Hardware: proxmox vms
Variables Used
all.yml
---
k3s_version: v1.27.4+k3s1
# this is the user that has ssh access to these machines
ansible_user: ubuntu
ansible_ssh_private_key_file: ~/.ssh/ed25519
systemd_dir: /etc/systemd/system
# Set your timezone
system_timezone: "Asia/Tbilisi"
# interface which will be used for flannel
flannel_iface: "eth0"
# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.10.11"
# k3s_token is required masters can talk together securely
# this token should be alpha numeric only
k3s_token: "REDUCTED"
# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'
# Disable the taint manually by setting: k3s_master_taint = false
# k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"
k3s_master_taint: false
# these arguments are recommended for servers as well as agents:
extra_args: >-
--flannel-iface={{ flannel_iface }}
--node-ip={{ k3s_node_ip }}
# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik
{{ extra_args_for_prom }}
extra_agent_args: >-
{{ extra_args }}
extra_args_for_prom: >-
--kube-controller-manager-arg "bind-address=0.0.0.0"
--kube-proxy-arg "metrics-bind-address=0.0.0.0"
--kube-scheduler-arg "bind-address=0.0.0.0"
--etcd-expose-metrics=true
# --kube-scheduler-arg "address=0.0.0.0"
# --kube-controller-manager-arg "address=0.0.0.0"
# image tag for kube-vip
kube_vip_tag_version: "v0.6.2"
# metallb type frr or native
metal_lb_type: "native"
# metallb mode layer2 or bgp
metal_lb_mode: "layer2"
# image tag for metal lb
# metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.11"
metal_lb_controller_tag_version: "v0.13.11"
# metallb ip range for load balancer
metal_lb_ip_range: "192.168.10.80-192.168.10.90"
log_destination: "./logs"
proxmox_lxc_configure: false
# the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,
# set this value to some-user
proxmox_lxc_ssh_user: root
# the unique proxmox ids for all of the containers in the cluster, both worker and master nodes
proxmox_lxc_ct_ids:
- 200
- 201
- 202
- 203
- 204
"Wait for MetalLB resources" task fails if there is more then one replica set
Expected Behavior
"Wait for MetalLB resources" task finishes without errors
Current Behavior
"Wait for MetalLB resources" task fails
Steps to Reproduce
This is happening after update of metallb components
Context (variables)
3 master cluster in proxmox vms on Ubuntu 22.04.3 LTS
Operating system: Ubuntu 22.04.3 LTS
Hardware: proxmox vms
Variables Used
all.yml
Hosts
host.ini
Possible Solution
Check current replicaset before checking for condition or check deployment itself
Logs from ansible-playbook
TASK [k3s/post : Wait for MetalLB resources] ******************************************************************************************************* ok: [k3s-master-11.int.geoshapka.xyz] => (item=controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "controller", "--for", "condition=Available=True", "--timeout=120s"], "delta": "0:00:00.231939", "end": "2023-09-16 17:57:22.992662", "item": {"condition": "--for condition=Available=True", "description": "controller", "name": "controller", "resource": "deployment"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:22.760723", "stderr": "", "stderr_lines": [], "stdout": "deployment.apps/controller condition met", "stdout_lines": ["deployment.apps/controller condition met"]} ok: [k3s-master-11.int.geoshapka.xyz] => (item=webhook service) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "pod", "--namespace=metallb-system", "--selector=component=controller", "--for=jsonpath={.status.phase}=Running", "--timeout=120s"], "delta": "0:00:00.222319", "end": "2023-09-16 17:57:23.487403", "item": {"condition": "--for=jsonpath='{.status.phase}'=Running", "description": "webhook service", "resource": "pod", "selector": "component=controller"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:23.265084", "stderr": "", "stderr_lines": [], "stdout": "pod/controller-64f57db87d-gz7cf condition met", "stdout_lines": ["pod/controller-64f57db87d-gz7cf condition met"]} ok: [k3s-master-11.int.geoshapka.xyz] => (item=pods in replica sets) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "pod", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for", "condition=Ready", "--timeout=120s"], "delta": "0:00:00.217239", "end": "2023-09-16 17:57:23.978061", "item": {"condition": "--for condition=Ready", "description": "pods in replica sets", "resource": "pod", "selector": "component=controller,app=metallb"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:23.760822", "stderr": "", "stderr_lines": [], "stdout": "pod/controller-64f57db87d-gz7cf condition met", "stdout_lines": ["pod/controller-64f57db87d-gz7cf condition met"]} ok: [k3s-master-11.int.geoshapka.xyz] => (item=ready replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.readyReplicas}=1", "--timeout=120s"], "delta": "0:00:00.218818", "end": "2023-09-16 17:57:24.468843", "item": {"condition": "--for=jsonpath='{.status.readyReplicas}'=1", "description": "ready replicas of controller", "resource": "deployment", "selector": "component=controller,app=metallb"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:24.250025", "stderr": "", "stderr_lines": [], "stdout": "deployment.apps/controller condition met", "stdout_lines": ["deployment.apps/controller condition met"]} failed: [k3s-master-11.int.geoshapka.xyz] (item=fully labeled replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.fullyLabeledReplicas}=1", "--timeout=120s"], "delta": "0:02:00.120743", "end": "2023-09-16 17:59:24.856453", "item": {"condition": "--for=jsonpath='{.status.fullyLabeledReplicas}'=1", "description": "fully labeled replicas of controller", "resource": "deployment", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2023-09-16 17:57:24.735710", "stderr": "error: timed out waiting for the condition on deployments/controller", "stderr_lines": ["error: timed out waiting for the condition on deployments/controller"], "stdout": "", "stdout_lines": []} ok: [k3s-master-11.int.geoshapka.xyz] => (item=available replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.availableReplicas}=1", "--timeout=120s"], "delta": "0:00:00.237164", "end": "2023-09-16 17:59:25.383833", "item": {"condition": "--for=jsonpath='{.status.availableReplicas}'=1", "description": "available replicas of controller", "resource": "deployment", "selector": "component=controller,app=metallb"}, "msg": "", "rc": 0, "start": "2023-09-16 17:59:25.146669", "stderr": "", "stderr_lines": [], "stdout": "deployment.apps/controller condition met", "stdout_lines": ["deployment.apps/controller condition met"]}Comment
I`ve temporarily fixed it by manually deleting old replica set