kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.95k stars 442 forks source link

[BUG] Initialization of nodes is too slow when adding new nodes or restarting kube-ovn-controller in a large-scale cluster #4538

Closed cmdy closed 3 days ago

cmdy commented 4 weeks ago

Kube-OVN Version

v1.12.21

Kubernetes Version

v1.28.11

Operation-system/Kernel Version

"CentOS Linux 7 (Core)" 6.6.35-2406.el7.bzl.x86_64

Description

In large-scale cluster (node ​​1045, subnet 101), when adding nodes in batches, the node readiness time is too long. Through monitoring, we can see that the AddNode work queue latency has reached the hour level. image

Steps To Reproduce

  1. add 1000+ node and 100+ subnet
  2. restart kube-ovn-controller or continue to add more nodes

Current Behavior

The time it takes to add a new node to be ready is too long

Expected Behavior

New nodes are ready quickly

dosubot[bot] commented 4 weeks ago

Hey there, @cmdy! I'm here to help you with any bugs, questions, or contributions you have in mind. Let me know how I can assist you!

To address the issue of slow node initialization when adding new nodes or restarting kube-ovn-controller in a large-scale cluster, you can consider the following steps:

  1. Increase the Number of Workers: Adjust the WorkerNum configuration parameter to increase the number of concurrent workers processing the node event queues. This can help reduce the node readiness time by allowing more events to be processed in parallel. The default value of WorkerNum is 3, but you can increase it based on your cluster's scale and performance requirements.

    // run node worker before handle any pods
    for i := 0; i < c.config.WorkerNum; i++ {
       go wait.Until(runWorker("add node", c.addNodeQueue, c.handleAddNode), time.Second, ctx.Done())
       go wait.Until(runWorker("update node", c.updateNodeQueue, c.handleUpdateNode), time.Second, ctx.Done())
       go wait.Until(runWorker("delete node", c.deleteNodeQueue, c.handleDeleteNode), time.Second, ctx.Done())
    }
  2. Optimize Configuration Parameters: Several configuration parameters can affect the performance of the kube-ovn-controller. Consider tuning the following parameters to optimize performance:

    • --gc-interval: Sets the garbage collection interval.
    • --inspect-interval: Sets the interval for inspection tasks.
    • DPDK_CPU and DPDK_MEMORY: Configure the amount of CPU and memory allocated for DPDK.
    • HYBRID_DPDK: Enable hybrid DPDK to allow nodes to run both OVS-DPDK and OVS-kernel.
    • --ovn-nb-addr and --ovn-sb-addr: Addresses of the OVN Northbound and Southbound databases.
    • --ovn-timeout: Timeout value for OVN operations.
    • --ovsdb-con-timeout and --ovsdb-inactivity-timeout: Connection and inactivity timeouts for OVSDB.
    • --cluster-router and --node-switch: Names of the cluster router and node gateway switch.
    • --node-switch-cidr: CIDR for the node switch.
  3. Address Known Bottlenecks: There are known bottlenecks in the current implementation of the kube-ovn-controller that could affect node readiness time. Specifically, the commit cf725882 addresses the issue of waiting for the pod network to be ready, which can take a long time. Ensure you are using a version that includes this fix.

For more details, you can refer to the following sources:

Additionally, you might find it useful to review similar issues that have been discussed previously:

To continue talking to Dosu, mention @dosu.

cmdy commented 4 weeks ago

I found that there are many for loops in the code when processing Logical_Router_Policy And Port_Group。

In handleAddNode

for _, subnet := range subnets {
        if (subnet.Spec.Vlan != "" && !subnet.Spec.LogicalGateway) || subnet.Spec.Vpc != c.config.ClusterRouter || subnet.Name == c.config.NodeSwitch || subnet.Spec.GatewayType != kubeovnv1.GWDistributedType {
            continue
        }
        if err = c.createPortGroupForDistributedSubnet(node, subnet); err != nil {
            klog.Errorf("failed to create port group for node %s and subnet %s: %v", node.Name, subnet.Name, err)
            return err
        }
        if err = c.addPolicyRouteForDistributedSubnet(subnet, node.Name, v4IP, v6IP); err != nil {
            klog.Errorf("failed to add policy router for node %s and subnet %s: %v", node.Name, subnet.Name, err)
            return err
        }
        // policy route for overlay distributed subnet should be reconciled when node ip changed
        c.addOrUpdateSubnetQueue.Add(subnet.Name)
    }
func (c *OVNNbClient) listLogicalRouterPoliciesByFilter(lrName string, filter func(route *ovnnb.LogicalRouterPolicy) bool) ([]*ovnnb.LogicalRouterPolicy, error) {
    klog.V(3).Infof("list logical router policy %s", lrName)
    start := time.Now()
    lr, err := c.GetLogicalRouter(lrName, false)
    if err != nil {
        klog.Error(err)
        return nil, err
    }
    policyList := make([]*ovnnb.LogicalRouterPolicy, 0, len(lr.Policies))

    for _, uuid := range lr.Policies {
        startPolicy := time.Now()
        policy, err := c.GetLogicalRouterPolicyByUUID(uuid)
        if err != nil {
            if errors.Is(err, client.ErrNotFound) {
                continue
            }
            return nil, err
        }
        if filter == nil || filter(policy) {
            policyList = append(policyList, policy)
        }
        if time.Since(startPolicy).Milliseconds() > 100 {
            klog.V(3).Infof("take to %vms get logical router policy %s lr policies len %v policy %v", time.Since(startPolicy).Milliseconds(), lrName, len(lr.Policies), policy)
        }
    }

    klog.V(3).Infof("take to %vms list logical router policy %s", time.Since(start).Milliseconds(), lrName)

    return policyList, nil
}

When the number of Logical_Router_Policy is 106744 ,I tested the processing time. image

cmdy commented 4 weeks ago

I found that there are many for loops in the code when processing Logical_Router_Policy And Port_Group。

In handleAddNode

for _, subnet := range subnets {
        if (subnet.Spec.Vlan != "" && !subnet.Spec.LogicalGateway) || subnet.Spec.Vpc != c.config.ClusterRouter || subnet.Name == c.config.NodeSwitch || subnet.Spec.GatewayType != kubeovnv1.GWDistributedType {
            continue
        }
        if err = c.createPortGroupForDistributedSubnet(node, subnet); err != nil {
            klog.Errorf("failed to create port group for node %s and subnet %s: %v", node.Name, subnet.Name, err)
            return err
        }
        if err = c.addPolicyRouteForDistributedSubnet(subnet, node.Name, v4IP, v6IP); err != nil {
            klog.Errorf("failed to add policy router for node %s and subnet %s: %v", node.Name, subnet.Name, err)
            return err
        }
        // policy route for overlay distributed subnet should be reconciled when node ip changed
        c.addOrUpdateSubnetQueue.Add(subnet.Name)
    }
func (c *OVNNbClient) listLogicalRouterPoliciesByFilter(lrName string, filter func(route *ovnnb.LogicalRouterPolicy) bool) ([]*ovnnb.LogicalRouterPolicy, error) {
    klog.V(3).Infof("list logical router policy %s", lrName)
    start := time.Now()
    lr, err := c.GetLogicalRouter(lrName, false)
    if err != nil {
        klog.Error(err)
        return nil, err
    }
    policyList := make([]*ovnnb.LogicalRouterPolicy, 0, len(lr.Policies))

    for _, uuid := range lr.Policies {
        startPolicy := time.Now()
        policy, err := c.GetLogicalRouterPolicyByUUID(uuid)
        if err != nil {
            if errors.Is(err, client.ErrNotFound) {
                continue
            }
            return nil, err
        }
        if filter == nil || filter(policy) {
            policyList = append(policyList, policy)
        }
        if time.Since(startPolicy).Milliseconds() > 100 {
            klog.V(3).Infof("take to %vms get logical router policy %s lr policies len %v policy %v", time.Since(startPolicy).Milliseconds(), lrName, len(lr.Policies), policy)
        }
    }

    klog.V(3).Infof("take to %vms list logical router policy %s", time.Since(start).Milliseconds(), lrName)

    return policyList, nil
}

When the number of Logical_Router_Policy is 106744 ,I tested the processing time. image

I think this should be adjusted to WhereCache().List

    for _, uuid := range lr.Policies {
        startPolicy := time.Now()
        policy, err := c.GetLogicalRouterPolicyByUUID(uuid)
        if err != nil {
            if errors.Is(err, client.ErrNotFound) {
                continue
            }
            return nil, err
        }
        if filter == nil || filter(policy) {
            policyList = append(policyList, policy)
        }
        if time.Since(startPolicy).Milliseconds() > 100 {
            klog.V(3).Infof("take to %vms get logical router policy %s lr policies len %v policy %v", time.Since(startPolicy).Milliseconds(), lrName, len(lr.Policies), policy)
        }
    }
zhangzujian commented 5 days ago

Thanks for the information! Could you please try this patch for the latest v1.12 version?

cmdy commented 2 days ago

Thanks for the information! Could you please try this patch for the latest v1.12 version?

I have adjusted this method locally and tested it. I will merge this patch after testing.