[BUG] Performance about Service & NetworkPolicy

kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)

Apache License 2.0

1.95k stars 442 forks source link

Kube-OVN Version

v1.12.26

Kubernetes Version

v1.27.4

Operation-system/Kernel Version

TencentOS Server 4.0 6.6.6-2401.0.1.tl4.4.x86_64

Description

I have a cluster with ~300 Pods and ~100 NetworkPolicies. I've noticed that every time I create a Service, a significant number of UpdateNp logs are added to the kube-ovn-controller.log, and at the same time, the Dashboard shows a Work Queue Latency reaching about 1 minute.

Then I checked the code: https://github.com/kubeovn/kube-ovn/blob/v1.12.26/pkg/controller/network_policy.go#L855-L878

The above code seems to indicate that whenever a Service is created, all Pods within the corresponding Namespace are retrieved, and then all NetworkPolicies are matched to create an 'UpdateNp' queue. This not only results in an O(n^2) time complexity, in my cluster, is equivalent to updating all NetworkPolicies

https://github.com/kubeovn/kube-ovn/pull/1167

Since Pods already match and respond to UpdateNp requests when they are created, and no additional operations are performed on NetworkPolicies during the creation of a Service, I don't quite understand the purpose here. Moreover, in large-scale clusters, creating/deleting Services should be a relatively common operation. In this case, I'm afraid that all requests for creating/deleting Pods will be blocked after the creation of a Service.

Is there any solution?

Steps To Reproduce

Create a large number of NetworkPolicies and Pods
Create/Delete a Service.

Current Behavior

Expected Behavior

Hey @zsxsoft, I'm here to help you with any bugs, questions, or becoming a contributor! 😊

To address the performance issue related to Service creation and NetworkPolicy updates in Kube-OVN, you can consider the following optimizations:

Increase Probe Interval: Adjusting the default probe interval for large clusters can help reduce the load and improve performance. This change has been implemented in a recent update ^[1].
Optimize Protocol Checks: The protocol check in NetworkPolicy updates has been streamlined to reduce latency, which is part of the recent optimizations ^[1].
Review Pull Request #1167: This pull request specifically addresses the O(n^2) complexity issue by optimizing how Services interact with NetworkPolicies. It suggests that since Pods already handle UpdateNp requests upon creation, the additional processing during Service creation might be unnecessary. Reviewing the changes in this pull request could provide insights into further optimizations.
Consider Caching Mechanisms: Implementing caching mechanisms could help reduce the need to re-evaluate all NetworkPolicies when a Service is created, thus improving performance.

These steps aim to mitigate the high latency and potential blocking of operations in large clusters.

To continue talking to Dosu, mention @dosu.

kubeovn / kube-ovn