techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

API endpoint virtual IP never shows up on other master nodes when current master shuts down or reboots #272

Closed UntouchedWagons closed 1 year ago

UntouchedWagons commented 1 year ago

Expected Behavior

When the master node with the API endpoint virtual IP shuts down or reboots another master node should take on that IP. Communications with the cluster via kubectl fails until the master node becomes available

Current Behavior

Alternate master nodes do not get assigned the API endpoint virtual IP specified in group_vars/all.yaml if the current master with that VIP shuts down or otherwise becomes unavailable

Steps to Reproduce

  1. Set up cluster with 2+ master nodes (in my case as proxmox VMs)
  2. Start pinging the virtual IP continuously
  3. Shut down the node that has that VIP
  4. ping eventually times out

Context (variables)

Operating system: Ubuntu 22.04

Hardware: Proxmox virtual machines

Variables Used

all.yml

---
k3s_version: v1.24.12+k3s1
# this is the user that has ssh access to these machines
ansible_user: jordan
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "America/Toronto"

# interface which will be used for flannel
flannel_iface: "eth0"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.20.30"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "ReelApplauseStretchPossession"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.11"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# bgp options
# metal_lb_bgp_my_asn: "64513"
# metal_lb_bgp_peer_asn: "64512"
# metal_lb_bgp_peer_address: "192.168.30.1"

# image tag for metal lb
metal_lb_frr_tag_version: "v7.5.1"
metal_lb_speaker_tag_version: "v0.13.9"
metal_lb_controller_tag_version: "v0.13.9"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.20.201-192.168.20.250"

Hosts

host.ini

[master]
k3s-testing-master-01.untouchedwagons.site
k3s-testing-master-02.untouchedwagons.site

[node]
k3s-testing-worker-01.untouchedwagons.site
k3s-testing-worker-02.untouchedwagons.site

[k3s_cluster:children]
master
node

Possible Solution

timothystewart6 commented 1 year ago

You need 3 master nodes for HA. Can you test with 3 nodes?