Closed geniusxiong closed 7 months ago
ovn-ic的gw-nodes,配置成2个或多个节点作为集群互联中承担网关工作,但是配置的节点故障后,不能自动切换到其他节点继续作为网关节点,导致通信故障
ovn-ic的gw-nodes中,配置好的网关节点,如果其中一个节点故障,能够自定切换到配置的其他节点作为网关节点
配置一个节点ovn-ic-config ConfigMap:
apiVersion: v1 kind: ConfigMap metadata: name: ovn-ic-config namespace: kube-system data: enable-ic: "true" az-name: "az176" ic-db-host: "172.18.164.171" ic-nb-port: "6645" ic-sb-port: "6646" gw-nodes: "node-0,node-1" auto-route: "true"
ovn-ic 容器内已建立互联逻辑交换机 ts,集群可以互通
root@master-1:/kube-ovn# ovn-ic-sbctl show availability-zone az140 gateway b5f7b2d5-e002-486b-a933-9d30f92b09d5 hostname: node-0 type: geneve ip: 172.18.164.143 gateway e33e7c0d-b3c3-4afd-9598-569f024aeb9e hostname: master-0 type: geneve ip: 172.18.164.140 port ts-az140 transit switch: ts address: ["00:00:00:E3:E2:BC 169.254.100.93/24"] availability-zone az170 gateway 224a129c-63ce-46fb-b94f-e87ee7bd0f52 hostname: node-0 type: geneve ip: 172.18.164.173 port ts-az170 transit switch: ts address: ["00:00:00:34:05:F3 169.254.100.80/24"] **availability-zone az176** gateway 787b7f2e-a18a-4c77-b4a1-0fcf304fbbe7 **hostname: node-1** type: geneve ip: 172.18.164.177 **port ts-az176** transit switch: ts address: ["00:00:00:50:E0:EB 169.254.100.90/24"] gateway 7e6c6ee8-5f12-4eea-b7d6-355b61297fff **hostname: node-0** type: geneve ip: 172.18.164.179
此时集群互通成功,az176的pod能够访问az170和az140的pod
把集群az176的node-1节点(172.18.164.177)关机,ovn-ic 容器内已建立互联逻辑交换机 ts没有变化,并没有自动切换到配置的node-0节点作为网关维持通信,此时az176的pod不能访问az170和az140的pod
把集群az176的node-1节点(172.18.164.177)开机,通信恢复正常
Kubernetes version:
Output of kubectl version:
kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
1.11.3
operation-system/kernel version:
Output of awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release: Output of uname -r:
awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
uname -r
NFS Server 4.0 (G193) 4.19.113-3.nfs.x86_64
网关节点需要配置成3个以上,才可以自动切换。
Bug Report
ovn-ic的gw-nodes,配置成2个或多个节点作为集群互联中承担网关工作,但是配置的节点故障后,不能自动切换到其他节点继续作为网关节点,导致通信故障
Expected Behavior
Actual Behavior
ovn-ic的gw-nodes中,配置好的网关节点,如果其中一个节点故障,能够自定切换到配置的其他节点作为网关节点
Steps to Reproduce the Problem
配置一个节点ovn-ic-config ConfigMap:
ovn-ic 容器内已建立互联逻辑交换机 ts,集群可以互通
此时集群互通成功,az176的pod能够访问az170和az140的pod
把集群az176的node-1节点(172.18.164.177)关机,ovn-ic 容器内已建立互联逻辑交换机 ts没有变化,并没有自动切换到配置的node-0节点作为网关维持通信,此时az176的pod不能访问az170和az140的pod
把集群az176的node-1节点(172.18.164.177)开机,通信恢复正常
Additional Info
Kubernetes version:
Output of
kubectl version
:operation-system/kernel version:
Output of
awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
: Output ofuname -r
: