techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

"Wait for MetalLB resources" task fails if there is more then one replicaset #363

Closed geoshapka closed 1 year ago

geoshapka commented 1 year ago

"Wait for MetalLB resources" task fails if there is more then one replica set

Expected Behavior

"Wait for MetalLB resources" task finishes without errors

Current Behavior

"Wait for MetalLB resources" task fails

Steps to Reproduce

This is happening after update of metallb components

  1. Install cluster with defaults
  2. update variables metal_lb_speaker_tag_version and metal_lb_controller_tag_version to newer versions
  3. run playbook normally
  4. playbook fails on "Wait for MetalLB resources" task

Context (variables)

3 master cluster in proxmox vms on Ubuntu 22.04.3 LTS

Operating system: Ubuntu 22.04.3 LTS

Hardware: proxmox vms

Variables Used

all.yml

---
k3s_version: v1.27.4+k3s1
# this is the user that has ssh access to these machines
ansible_user: ubuntu
ansible_ssh_private_key_file: ~/.ssh/ed25519
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "Asia/Tbilisi"

# interface which will be used for flannel
flannel_iface: "eth0"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.10.11"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "REDUCTED"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
# k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"
k3s_master_taint: false

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
  {{ extra_args_for_prom }}
extra_agent_args: >-
  {{ extra_args }}

extra_args_for_prom: >-
  --kube-controller-manager-arg "bind-address=0.0.0.0"
  --kube-proxy-arg "metrics-bind-address=0.0.0.0"
  --kube-scheduler-arg "bind-address=0.0.0.0"
  --etcd-expose-metrics=true

# --kube-scheduler-arg "address=0.0.0.0"
# --kube-controller-manager-arg "address=0.0.0.0"
# image tag for kube-vip
kube_vip_tag_version: "v0.6.2"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# image tag for metal lb
# metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.11"
metal_lb_controller_tag_version: "v0.13.11"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.10.80-192.168.10.90"

log_destination: "./logs"

proxmox_lxc_configure: false
# the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,
# set this value to some-user
proxmox_lxc_ssh_user: root
# the unique proxmox ids for all of the containers in the cluster, both worker and master nodes
proxmox_lxc_ct_ids:
  - 200
  - 201
  - 202
  - 203
  - 204

Hosts

host.ini

[master]
k3s-master-11
k3s-master-12
k3s-master-13

[node]

[k3s_cluster:children]
master
node

Possible Solution

Check current replicaset before checking for condition or check deployment itself

Logs from ansible-playbook TASK [k3s/post : Wait for MetalLB resources] ******************************************************************************************************* ok: [k3s-master-11.int.geoshapka.xyz] => (item=controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "controller", "--for", "condition=Available=True", "--timeout=120s"], "delta": "0:00:00.231939", "end": "2023-09-16 17:57:22.992662", "item": {"condition": "--for condition=Available=True", "description": "controller", "name": "controller", "resource": "deployment"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:22.760723", "stderr": "", "stderr_lines": [], "stdout": "deployment.apps/controller condition met", "stdout_lines": ["deployment.apps/controller condition met"]} ok: [k3s-master-11.int.geoshapka.xyz] => (item=webhook service) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "pod", "--namespace=metallb-system", "--selector=component=controller", "--for=jsonpath={.status.phase}=Running", "--timeout=120s"], "delta": "0:00:00.222319", "end": "2023-09-16 17:57:23.487403", "item": {"condition": "--for=jsonpath='{.status.phase}'=Running", "description": "webhook service", "resource": "pod", "selector": "component=controller"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:23.265084", "stderr": "", "stderr_lines": [], "stdout": "pod/controller-64f57db87d-gz7cf condition met", "stdout_lines": ["pod/controller-64f57db87d-gz7cf condition met"]} ok: [k3s-master-11.int.geoshapka.xyz] => (item=pods in replica sets) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "pod", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for", "condition=Ready", "--timeout=120s"], "delta": "0:00:00.217239", "end": "2023-09-16 17:57:23.978061", "item": {"condition": "--for condition=Ready", "description": "pods in replica sets", "resource": "pod", "selector": "component=controller,app=metallb"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:23.760822", "stderr": "", "stderr_lines": [], "stdout": "pod/controller-64f57db87d-gz7cf condition met", "stdout_lines": ["pod/controller-64f57db87d-gz7cf condition met"]} ok: [k3s-master-11.int.geoshapka.xyz] => (item=ready replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.readyReplicas}=1", "--timeout=120s"], "delta": "0:00:00.218818", "end": "2023-09-16 17:57:24.468843", "item": {"condition": "--for=jsonpath='{.status.readyReplicas}'=1", "description": "ready replicas of controller", "resource": "deployment", "selector": "component=controller,app=metallb"}, "msg": "", "rc": 0, "start": "2023-09-16 17:57:24.250025", "stderr": "", "stderr_lines": [], "stdout": "deployment.apps/controller condition met", "stdout_lines": ["deployment.apps/controller condition met"]} failed: [k3s-master-11.int.geoshapka.xyz] (item=fully labeled replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.fullyLabeledReplicas}=1", "--timeout=120s"], "delta": "0:02:00.120743", "end": "2023-09-16 17:59:24.856453", "item": {"condition": "--for=jsonpath='{.status.fullyLabeledReplicas}'=1", "description": "fully labeled replicas of controller", "resource": "deployment", "selector": "component=controller,app=metallb"}, "msg": "non-zero return code", "rc": 1, "start": "2023-09-16 17:57:24.735710", "stderr": "error: timed out waiting for the condition on deployments/controller", "stderr_lines": ["error: timed out waiting for the condition on deployments/controller"], "stdout": "", "stdout_lines": []} ok: [k3s-master-11.int.geoshapka.xyz] => (item=available replicas of controller) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "wait", "deployment", "--namespace=metallb-system", "--selector=component=controller,app=metallb", "--for=jsonpath={.status.availableReplicas}=1", "--timeout=120s"], "delta": "0:00:00.237164", "end": "2023-09-16 17:59:25.383833", "item": {"condition": "--for=jsonpath='{.status.availableReplicas}'=1", "description": "available replicas of controller", "resource": "deployment", "selector": "component=controller,app=metallb"}, "msg": "", "rc": 0, "start": "2023-09-16 17:59:25.146669", "stderr": "", "stderr_lines": [], "stdout": "deployment.apps/controller condition met", "stdout_lines": ["deployment.apps/controller condition met"]}

Comment

I`ve temporarily fixed it by manually deleting old replica set