kube-vip / kube-vip

Kubernetes Control Plane Virtual IP and Load-Balancer
https://kube-vip.io
Apache License 2.0
2.19k stars 231 forks source link

control plane load balancing does not work #454

Closed willzhang closed 9 months ago

willzhang commented 2 years ago

Describe the bug control plane load balancing does not work

To Reproduce Steps to reproduce the behavior:

root@master1:~# cat kubeadm.yaml 
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.72.30
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: master1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
  extraArgs:
    authorization-mode: Node,RBAC
  certSANs:
  - apiserver.k8s.local
  - master1
  - master2
  - master3
  - worker1
  - 192.168.72.30
  - 192.168.72.31
  - 192.168.72.32
  - 192.168.72.33
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
controlPlaneEndpoint: apiserver.k8s.local:6443
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"

init cluster with

 kubeadm init --upload-certs --config kubeadm.yaml

Expected behavior control plane load balancing with ipvs and vip.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kube-vip.yaml:

root@master1:~# cat /etc/kubernetes/manifests/kube-vip.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: kube-vip
  namespace: kube-system
spec:
  containers:
  - args:
    - manager
    env:
    - name: vip_arp
      value: "true"
    - name: port
      value: "6443"
    - name: vip_interface
      value: ens160
    - name: vip_cidr
      value: "32"
    - name: cp_enable
      value: "true"
    - name: cp_namespace
      value: kube-system
    - name: vip_ddns
      value: "false"
    - name: svc_enable
      value: "true"
    - name: vip_leaderelection
      value: "true"
    - name: vip_leaseduration
      value: "5"
    - name: vip_renewdeadline
      value: "3"
    - name: vip_retryperiod
      value: "1"
    - name: lb_enable
      value: "true"
    - name: lb_port
      value: "6443"
    - name: lb_fwdmethod
      value: local
    - name: address
      value: apiserver.k8s.local
    - name: prometheus_server
      value: :2112
    image: ghcr.io/kube-vip/kube-vip:v0.5.0
    imagePullPolicy: IfNotPresent
    name: kube-vip
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
    volumeMounts:
    - mountPath: /etc/kubernetes/admin.conf
      name: kubeconfig
  hostAliases:
  - hostnames:
    - kubernetes
    ip: 127.0.0.1
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/admin.conf
    name: kubeconfig
status: {}

Additional context

can not see ipvs loadbalaning with vip 192.168.72.200

root@master1:~# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 192.168.72.30:6443           Masq    1      0          0         
TCP  10.96.0.10:53 rr
TCP  10.96.0.10:9153 rr
UDP  10.96.0.10:53 rr
root@master1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:ad:69:e1 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    inet 192.168.72.30/24 brd 192.168.72.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 192.168.72.200/32 scope global deprecated dynamic ens160
       valid_lft 59sec preferred_lft 0sec
    inet6 fe80::250:56ff:fead:69e1/64 scope link 
       valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 46:88:38:90:21:f1 brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
root@master1:~# 

kube-vip pod logs

root@master1:~# kubectl -n kube-system  logs  kube-vip-master1 | more
time="2022-09-21T14:34:44Z" level=info msg="Starting kube-vip.io [v0.5.0]"
time="2022-09-21T14:34:44Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
time="2022-09-21T14:34:44Z" level=info msg="prometheus HTTP server started"
time="2022-09-21T14:34:44Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
time="2022-09-21T14:34:44Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [master1]"
I0921 14:34:44.427389       1 leaderelection.go:248] attempting to acquire leader lease kube-system/plndr-svcs-lock...
time="2022-09-21T14:34:44Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [master1]"
I0921 14:34:44.434777       1 leaderelection.go:248] attempting to acquire leader lease kube-system/plndr-cp-lock...
E0921 14:34:44.435347       1 leaderelection.go:330] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/nam
espaces/kube-system/leases/plndr-cp-lock": dial tcp 127.0.0.1:6443: connect: connection refused
E0921 14:34:44.435313       1 leaderelection.go:330] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/n
amespaces/kube-system/leases/plndr-svcs-lock": dial tcp 127.0.0.1:6443: connect: connection refused
I0921 14:34:48.449944       1 leaderelection.go:258] successfully acquired lease kube-system/plndr-cp-lock
time="2022-09-21T14:34:48Z" level=info msg="Node [master1] is assuming leadership of the cluster"
time="2022-09-21T14:34:48Z" level=info msg="starting the DNS updater for the address apiserver.k8s.local"
I0921 14:34:48.450467       1 leaderelection.go:258] successfully acquired lease kube-system/plndr-svcs-lock
time="2022-09-21T14:34:48Z" level=info msg="Starting IPVS LoadBalancer"
time="2022-09-21T14:34:48Z" level=info msg="IPVS Loadbalancer enabled for 1.2.1"
time="2022-09-21T14:34:48Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [192.168.72.200]"
time="2022-09-21T14:34:48Z" level=info msg="Kube-Vip is watching nodes for control-plane labels"
time="2022-09-21T14:34:48Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:51Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:52Z" level=error msg="Error querying backends file does not exist"
time="2022-09-21T14:34:52Z" level=info msg="Created Load-Balancer services on [192.168.72.200:6443]"
time="2022-09-21T14:34:52Z" level=info msg="Added backend for [192.168.72.200:6443] on [192.168.72.30:6443]"
time="2022-09-21T14:34:54Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:34:57Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:00Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:03Z" level=info msg="setting 192.168.72.200 as an IP"
time="2022-09-21T14:35:06Z" level=info msg="setting 192.168.72.200 as an IP"
lwabish commented 1 year ago

same here

lwabish commented 1 year ago

Kubekey uses kube-vip to deploy ha cluster, same issue found and can be solved by add node cidr into kube-proxy configuration. Refer to : kubekey-1702 Maybe we should add some instruction on kube vip's website to remind users of this bug

cuiliang0302 commented 1 year ago

me too

os:Rocky Linux release 9.2 kernel:5.14.0-284.18.1.el9_2.x86_64 kubernetes:1.27.4 containerd:1.6.20 kube-vip:0.6.0

ii2day commented 11 months ago

Has anyone solved this problem yet?

smokes2345 commented 11 months ago

I'm still fuzzy on the details, but in the issue linked by @lwabish, it looks like the issue was resolved by telling the proxy service to ignore the subnet the control plane nodes live on. my guess is there is something like a race condition being generated. i added that subnet to the no_proxy config for my kubespray deployment, but it does not seem to have made a difference after running the playbook again.

blackliner commented 10 months ago

I had a similar issue, but using kubespray and metallb. The LB IP from the CP was gone, and I got the same error messages as above. Fortunately, Kubespray has a way to specify this exclusion --> https://github.com/kubernetes-sigs/kubespray/blob/747d8bb4c2d31669b2d7eed2b38bc4da2c689fab/roles/kubernetes/control-plane/defaults/main/kube-proxy.yml#L68

Correction, after applying the config changes and rerunning the kubespray playbooks, the error still occurs. I need to double check if the config made it's way through or not...

time="2024-01-12T19:37:34Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:37:34Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:37:34Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:37:34Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]"
time="2024-01-12T19:37:39Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:05Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:38:05Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:38:05Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:38:05Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]"
time="2024-01-12T19:38:10Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:30Z" level=error msg="Error querying backends file does not exist"
time="2024-01-12T19:38:30Z" level=info msg="Created Load-Balancer services on [10.128.5.1:6443]"
time="2024-01-12T19:38:30Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.12:6443]"
time="2024-01-12T19:38:36Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.14:6443]"
time="2024-01-12T19:38:36Z" level=info msg="Added backend for [10.128.5.1:6443] on [10.128.5.13:6443]

EDIT2: the kube-proxy configMap was not modified yet. But I doupt it is actually the issue, since there are no logs mentioning proxy deleting this IP (10.128.5.1), and I have watch -n 0 ip a show bond0 running on all three nodes, and one of them has the right IP all the time image

rraj-gautam commented 10 months ago

Kubekey uses kube-vip to deploy ha cluster, same issue found and can be solved by add node cidr into kube-proxy configuration. Refer to : kubekey-1702 Maybe we should add some instruction on kube vip's website to remind users of this bug

This worked for me. My kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.2.160" #control plane node local ip
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
imageRepository: registry.k8s.io
kubernetesVersion: 1.28.2
kubeProxyArgs: ["--ipvs-exclude-cidrs=192.168.2.0/24"] ###### cidr of node network #######
controlPlaneEndpoint: "192.168.2.159:6443" # loadbalancer VIP 
networking:
  serviceSubnet: 10.96.0.0/12
  podSubnet: "10.32.0.0/12"  
apiServer:
  timeoutForControlPlane: 4m0s
  certSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
    serverCertSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
    peerCertSANs:
      - "master01"
      - "master02"
      - "192.168.2.160"
      - "192.168.2.161"
      - "192.168.2.159"
      - "127.0.0.1"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: "systemd"
  #cgroupDriver: cgroupfs
lubronzhan commented 10 months ago

Yeah we should better add this to the doc

thebsdbox commented 9 months ago

This is now part of the documentation.