kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.95k stars 442 forks source link

2 node HA with OVN DB Availability #4631

Open kannanvr opened 3 days ago

kannanvr commented 3 days ago

Kube-OVN Version

1.12.26

Kubernetes Version

1.27.10

Operation-system/Kernel Version

5.15.0

Description

We have a setup like 2 node HA kubernetes. We are not using etcd to bring up the cluster. Rather we are using the postgres with kine to achieve HA with 2 node. We have a keepalived component with VIP which help us to promote the master postgres DB when some reboot happens.

Now when we install kube-ovn with this 2 node HA k8s setup, we are facing the issue. When any of the node goes down OVN-DB/OVN-CEBTRAL NB and SB database is not coming up. Hence OVN central pod is restarting and the new pod on 1 available node doesn't get any IP address.

It seems OVN-DB/OVN-CENTRAL is using raft consensus algorithm. Is it possible to use active passive way of connection ? Or any way to solve this case with VIP from keepalived.

Please help us to provide your valuable feedback on the issue.

Steps To Reproduce

Install 2 node HA Shutdown one of the node Bring up the new pod It doesn't get ip

Current Behavior

When node goes down , new pod is not allocated with IP due to DB ISSUE

Expected Behavior

When node goes down , new pod is allocated with IP due to DB ISSUE

dosubot[bot] commented 3 days ago

Hey @kannanvr! I'm here to assist you with any bugs, questions, or contributions. Let's work together to resolve your issue.

I couldn't find specific information on using an active-passive configuration for OVN databases in Kube-OVN. However, Kube-OVN handles high availability for the OVN databases using a script that monitors the status of the databases and manages leadership roles. This script checks the status of the OVN Northbound (NB) and Southbound (SB) databases, determines the leader, and ensures data consistency by managing the service endpoint and Kubernetes labels [1][2].

For diagnosing issues with OVN-DB availability, Kube-OVN provides specific logs and debugging tools. You can use commands to check the status of the OVN Southbound DB and get the OVN Central DB status. Additionally, the ovs-healthcheck.sh script can verify the connection and status of the OVN databases and offer corrective actions for common issues [3][4].

Regarding the use of keepalived and VIPs, there was a similar issue where pods on other nodes could not access the keepalived VIP after a node shutdown. The suggested solution was to use tcpdump to trace the packet and ensure that the VIP is treated as an external address, with packets being SNATed to the node address to visit the VIP [5].

You might want to explore these debugging tools and configurations to address the issue you're facing.

To continue talking to Dosu, mention @dosu.