Closed tangjizhou closed 3 years ago
dan@k8s01:~$ k get pods -A -o wide | grep -i vip
kube-system kube-vip-cloud-provider-0 1/1 Running 0 9m56s 10.0.77.1 k8s04 <none> <none>
kube-system kube-vip-k8s01 1/1 Running 0 7m17s 192.168.0.41 k8s01 <none> <none>
kube-system kube-vip-k8s02 1/1 Running 0 7m14s 192.168.0.42 k8s02 <none> <none>
kube-system kube-vip-k8s03 1/1 Running 0 6m39s 192.168.0.43 k8s03 <none> <none>
The above can quickly list where kube-vip is running?
Also if the failover is taking 2 minutes then ARP broadcasts appear not be updating the network correctly. Can you attach any logs and configuration (yaml) to this issue please.
Thanks reply! @thebsdbox we followed this tutorials to configure kube-vip, https://kube-vip.io/control-plane/ ,From the beginning to ‘remaining-nodes’ part. Our cluster have two master and one worker just like this:
[root@server04883 ~]# kubectl get pods -A -o wide | grep -i vip
kube-system kube-vip-server04883 1/1 Running 2 21h 10.30.221.144 server04883 <none> <none>
kube-system kube-vip-server01048 1/1 Running 3 21h 10.30.221.231 server01048 <none> <none>
[root@server04883 ~]# kubectl get node -A
NAME STATUS ROLES AGE VERSION
server04883 Ready control-plane,master 21h v1.20.5
server04884 Ready \<none\> 21h v1.20.5
server01048 Ready control-plane,master 21h v1.20.5
here is our kube-vip.yaml
# sudo cat /etc/kubernetes/manifests/kube-vip.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:
- args:
- manager
env:
- name: vip_arp
value: "true"
- name: vip_interface
value: eth0
- name: port
value: "6443"
- name: vip_cidr
value: "32"
- name: cp_enable
value: "true"
- name: cp_namespace
value: kube-system
- name: svc_enable
value: "true"
- name: vip_leaderelection
value: "true"
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: vip_address
value: 10.30.220.11
image: plndr/kube-vip:0.3.1
imagePullPolicy: Always
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
- SYS_TIME
volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/admin.conf
name: kubeconfig
status: {}
and when i power off one of the mastar node, i get below logs at another log while type this command:
# kubectl logs -n kube-system kube-vip-server01048(one of the vip node) -f
time="2021-07-09T10:02:03Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
time="2021-07-09T10:02:03Z" level=info msg="Namespace [kube-system], Hybrid mode [true]"
time="2021-07-09T10:02:03Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-svcs-lock], id [server01048]"
I0709 10:02:03.415839 1 leaderelection.go:243] attempting to acquire leader lease kube-system/plndr-svcs-lock...
time="2021-07-09T10:02:03Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [server01048]"
I0709 10:02:03.416124 1 leaderelection.go:243] attempting to acquire leader lease kube-system/plndr-cp-lock...
time="2021-07-09T10:02:04Z" level=info msg="new leader elected: server04883"
time="2021-07-09T10:02:04Z" level=info msg="Node [server04883] is assuming leadership of the cluster"
E0710 07:12:27.531401 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: etcdserver: request timed out
E0710 07:12:27.536439 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: etcdserver: request timed out
E0710 07:12:41.530034 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: etcdserver: request timed out
E0710 07:12:41.535624 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: etcdserver: request timed out
E0710 07:12:55.530485 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: etcdserver: request timed out
E0710 07:12:55.531457 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: etcdserver: request timed out
E0710 07:13:09.533391 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: etcdserver: request timed out
E0710 07:13:09.540434 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: etcdserver: request timed out
E0710 07:13:23.529408 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: etcdserver: request timed out
E0710 07:13:23.529551 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: etcdserver: request timed out
E0710 07:13:28.227925 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://server01048:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": dial tcp 10.30.221.231:6443: connect: connection refused
E0710 07:13:28.227933 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://server01048:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock": dial tcp 10.30.221.231:6443: connect: connection refused
E0710 07:13:29.387229 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://server01048:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": dial tcp
...repeat a lot...
E0710 07:14:07.439440 1 leaderelection.go:321] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://server01048:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": dial tcp 10.30.221.231:6443: connect: connection refused
E0710 07:14:07.706652 1 leaderelection.go:322] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://server01048:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock": dial tcp 10.30.221.231:6443: connect: connection refused
I0710 07:14:12.265002 1 leaderelection.go:253] successfully acquired lease kube-system/plndr-cp-lock
time="2021-07-10T07:14:12Z" level=info msg="Node [server01048] is assuming leadership of the cluster"
time="2021-07-10T07:14:12Z" level=info msg="This node is starting with leadership of the cluster"
time="2021-07-10T07:14:12Z" level=info msg="Broadcasting ARP update for 10.30.220.11 (fa:16:3e:4b:81:e4) via eth0"
I0710 07:14:12.268240 1 leaderelection.go:253] successfully acquired lease kube-system/plndr-svcs-lock
time="2021-07-10T07:14:12Z" level=info msg="Beginning watching services for type: LoadBalancer in all namespaces"
time="2021-07-10T07:14:12Z" level=info msg="Service [kubernetes] has been addded/modified, it has no assigned external addresses"
time="2021-07-10T07:14:12Z" level=info msg="Service [kube-dns] has been addded/modified, it has no assigned external addresses"
time="2021-07-10T07:14:12Z" level=info msg="Service [calico-kube-controllers-metrics] has been addded/modified, it has no assigned external addresses"
time="2021-07-10T07:14:12Z" level=info msg="Service [calico-typha] has been addded/modified, it has no assigned external addresses"
time="2021-07-10T07:14:15Z" level=info msg="Broadcasting ARP update for 10.30.220.11 (fa:16:3e:4b:81:e4) via eth0"
time="2021-07-10T07:14:18Z" level=info msg="Broadcasting ARP update for 10.30.220.11 (fa:16:3e:4b:81:e4) via eth0"
time="2021-07-10T07:14:21Z" level=info msg="Broadcasting ARP update for 10.30.220.11 (fa:16:3e:4b:81:e4) via eth0"
time="2021-07-10T07:14:24Z" level=info msg="Broadcasting ARP update for 10.30.220.11 (fa:16:3e:4b:81:e4) via eth0"
About 2mins later, we can ping to vip again:
64 bytes from 10.30.220.11: icmp_seq=84 ttl=64 time=0.494 ms
64 bytes from 10.30.220.11: icmp_seq=85 ttl=64 time=0.451 ms
64 bytes from 10.30.220.11: icmp_seq=86 ttl=64 time=0.412 ms
64 bytes from 10.30.220.11: icmp_seq=87 ttl=64 time=0.434 ms
From 10.243.12.1 icmp_seq=94 Destination Host Unreachable
From 10.243.12.1 icmp_seq=100 Destination Host Unreachable
From 10.243.12.1 icmp_seq=104 Destination Host Unreachable
From 10.243.12.1 icmp_seq=110 Destination Host Unreachable
From 10.243.12.1 icmp_seq=117 Destination Host Unreachable
From 10.243.12.1 icmp_seq=129 Destination Host Unreachable
From 10.243.12.1 icmp_seq=132 Destination Host Unreachable
From 10.243.12.1 icmp_seq=142 Destination Host Unreachable
From 10.243.12.1 icmp_seq=151 Destination Host Unreachable
From 10.243.12.1 icmp_seq=154 Destination Host Unreachable
From 10.243.12.1 icmp_seq=157 Destination Host Unreachable
From 10.243.12.1 icmp_seq=160 Destination Host Unreachable
From 10.243.12.1 icmp_seq=161 Destination Host Unreachable
From 10.243.12.1 icmp_seq=179 Destination Host Unreachable
From 10.243.12.1 icmp_seq=182 Destination Host Unreachable
From 10.243.12.1 icmp_seq=186 Destination Host Unreachable
From 10.243.12.1 icmp_seq=189 Destination Host Unreachable
64 bytes from 10.30.220.11: icmp_seq=192 ttl=64 time=2416 ms
64 bytes from 10.30.220.11: icmp_seq=193 ttl=64 time=1417 ms
64 bytes from 10.30.220.11: icmp_seq=194 ttl=64 time=417 ms
64 bytes from 10.30.220.11: icmp_seq=195 ttl=64 time=0.392 ms
64 bytes from 10.30.220.11: icmp_seq=196 ttl=64 time=0.383 ms
Ah!
You only have two control plane nodes! Etcd will not like that, you need to have an odd number in order for the clustering to work!
Closing issue as there has been no updates for 10 days.. please feel free to re-open.
how to list all kube-vip nodes ?
how to configure to let vip move to another kubernetes master faster when one of the kubernetes master is down ?
in my situation: after we shutdown a kube master node manually , kube cluster not available immediately. about 2 mins later , vip moved , and kube cluster available again .