Open zsxsoft opened 2 hours ago
Hey @zsxsoft, I'm here to help you with any bugs, questions, or becoming a contributor! 😊
To address the performance issue related to Service creation and NetworkPolicy updates in Kube-OVN, you can consider the following optimizations:
Increase Probe Interval: Adjusting the default probe interval for large clusters can help reduce the load and improve performance. This change has been implemented in a recent update [1].
Optimize Protocol Checks: The protocol check in NetworkPolicy updates has been streamlined to reduce latency, which is part of the recent optimizations [1].
Review Pull Request #1167: This pull request specifically addresses the O(n^2) complexity issue by optimizing how Services interact with NetworkPolicies. It suggests that since Pods already handle UpdateNp
requests upon creation, the additional processing during Service creation might be unnecessary. Reviewing the changes in this pull request could provide insights into further optimizations.
Consider Caching Mechanisms: Implementing caching mechanisms could help reduce the need to re-evaluate all NetworkPolicies when a Service is created, thus improving performance.
These steps aim to mitigate the high latency and potential blocking of operations in large clusters.
To continue talking to Dosu, mention @dosu.
Kube-OVN Version
v1.12.26
Kubernetes Version
v1.27.4
Operation-system/Kernel Version
TencentOS Server 4.0 6.6.6-2401.0.1.tl4.4.x86_64
Description
I have a cluster with ~300 Pods and ~100 NetworkPolicies. I've noticed that every time I create a Service, a significant number of UpdateNp logs are added to the
kube-ovn-controller.log
, and at the same time, the Dashboard shows a Work Queue Latency reaching about 1 minute.Then I checked the code: https://github.com/kubeovn/kube-ovn/blob/v1.12.26/pkg/controller/network_policy.go#L855-L878
The above code seems to indicate that whenever a Service is created, all Pods within the corresponding Namespace are retrieved, and then all NetworkPolicies are matched to create an 'UpdateNp' queue. This not only results in an O(n^2) time complexity, in my cluster, is equivalent to updating all NetworkPolicies
https://github.com/kubeovn/kube-ovn/pull/1167
Since Pods already match and respond to UpdateNp requests when they are created, and no additional operations are performed on NetworkPolicies during the creation of a Service, I don't quite understand the purpose here. Moreover, in large-scale clusters, creating/deleting Services should be a relatively common operation. In this case, I'm afraid that all requests for creating/deleting Pods will be blocked after the creation of a Service.
Is there any solution?
Steps To Reproduce
Current Behavior
/
Expected Behavior
/