Closed mikonse closed 1 week ago
@mikonse Could you attach your test script? It might be helpful to reproduce the problem.
Hi, it seems that we have the same problem after migration from openshift sdn to ovn-kubernetes CNI plugin. OKD 4.7.0-0.okd-2021-08-22-163618. if we remove all network policies from namespace everything goes smoothly. I attached 3 network policy where we have face a problem
@mikonse could you please share your script with @jomeier ? May be it will help to detect the root cause.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-openshift-ingress
namespace: performance-test-ovn
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
network.openshift.io/policy-group: ingress
podSelector: {}
policyTypes:
- Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-openshift-monitoring
namespace: performance-test-ovn
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
network.openshift.io/policy-group: monitoring
podSelector: {}
policyTypes:
- Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: performance-test-ovn
spec:
ingress:
- from:
- podSelector: {}
podSelector: {}
policyTypes:
- Ingress
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Hi there, when migrating our okd 4.5 cluster from the openshift sdn to ovn-kubernetes we are running into a couple problems. One of those is that we are experiencing varying network latencies in our cluster. More specifically, we encountered the problem with our installed operators as querying the kubernetes/openshift API sometimes has such a large delay/latency that some of the operators crash due to connection timeouts in their API connections.
I wrote a small poller container that checks the latency to the kube api every few seconds and received the following results: Most of the time the latency is normal at around 5-10ms, while sometimes increasing to 3-5s every couple of minutes or so. On exception the latency even increases to above 10s which is where some of the operators run into timeouts for their API connections. Here is a small log:
When deploying our cluster with the completely same settings, only swapping ovn-kubernetes with the openshift-sdn network provider everything is working fine and the latency is constant at the 5-10ms mark.
So far the only irregularities that I could find are a very frequent occurrence of
in the openshift apiserver logs, which to me indicates that something outside both the apiserver and the api client is forcibly closing the connections. As far as I could see the ovn and ovs logs looked normal. I am currently also testing the latency with custom deployments, will update this issue as soon as I have some results.
Any ideas/pointers on how to debug this, or did somebody experience similar issues?
Edit: Small update: When trying to reproduce the issues with self-deployed pods and measuring latency between those the latency was constant at the normal 5-10ms. I tested it with a simple nginx pod being queried both when running on the same node as well as on different ones. It seems like the network problems only arise when querying the openshift API through its hard coded IP.