Closed jsalgado78 closed 1 year ago
Could you share the network policy you use? Would help us to dive into this.
A database pod and a pod created by a cronjob in other worker node can't connect without a sleep command when a network policy exists.
So they eventually are able to connect, right?
Kinda feels like there's unexpectedly long delay applying the network policies. Can you spot anything suspicious in kube-router logs? kube-router is the one enforcing the policies in this case.
Network policy is included in this file https://github.com/k0sproject/k0s/files/9775557/complete-stack-example.txt This file includes complete yaml needed to probe this issue.
Copy and paste from previous file:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-network-policy namespace: test spec: egress: - to: - podSelector: {} - ports: - port: 53 protocol: TCP - port: 53 protocol: UDP to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ingress: - from: - podSelector: {} podSelector: {} policyTypes: - Ingress - Egress
thanks. somehow my 👀 missed it previously :)
I can see these errors in kube-router logs but these messages were generated some hours before cronjob fail:
E1013 06:26:38.176086 1 network_policy_controller.go:283] Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-CGZWTH6XDOELKCBP due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component E1013 06:30:32.324135 1 network_policy_controller.go:283] Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-V76DHIHZ37XZJSAY due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component E1013 10:42:51.389132 1 network_policy_controller.go:283] Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-WVVK5VS3NEKJJHZF due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component E1013 10:55:28.246109 1 network_policy_controller.go:283] Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-WVVK5VS3NEKJJHZF due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component E1013 11:14:50.063221 1 network_policy_controller.go:283] Failed to cleanup stale ipsets: failed to delete ipset KUBE-DST-H2JNHIQM3YF2AZ3L due to ipset v7.15: Set cannot be destroyed: it is in use by a kernel component
There's no kube-router logs when cronjob fail
The issue is marked as stale since no activity has been recorded in 30 days
Kubernetes cronjob cannot access database service in other worker node when network policy is used. It works fine if I run sleep 3 command at first or all pods are running in the same worker node.
@jnummelin @jsalgado78 , actually this some delay delay is expected and fairly common in most NetworkPolicy implementations. At least this happens in literally every one of the implementations that I know in detail. This is by design and I don't think this can be fixed because of how NetworkPolicies work. The flow looks like this:
Pod created in the API │ ▼ Pod Scheduled │ ▼ Actual pod creation in the node │ ▼ CNI Call │ ▼ CNI Plugin gets the call and assigns an IP address │ ├─► Pod definition is updated ─►SDN agents watching the API apply the rules in their destination. │ └─►Pod network plumbing is actually created and the controller runtime finishes starting the container
If we assume the constraint that the network is one big distributed switch instead of having a central switch taking care of everything (and you don't want this for performance and cost reasons), the only option would be to block the network plumbing until every destination that applies rules for it is configured. This has a two massive problems:
So I don't think this a real issue, I would agree if we were talking about a big delay, but 3 seconds seems fairly reasonable to me. If we were talking of a much higher value OK, fair enough, but if it's just 3 seconds I think the application should be able to handle it.
IMO this is acceptable behavior. What do you think @jnummelin ?
It appears to be rare because I've detected this issue in recent K0s versions, maybe from 1.24.2. It worked fine in previous K0s versions and I can't reproduce this issue in clusters running Mirantis Kubernetes Engine 3.4 (Kubernetes 1.20.x)
Are you 100% certain that the issue can be reproduced in 1.24.2 and not in 1.24.6, both using the OS version and hardware specs(to the extent of possible in VMs)? I don't see any significant change that could trigger this:
docker.io/cloudnativelabs/kube-router:v1.5.1
iptables_version = 1.8.7
pkg/component/controller/kuberouter.go
I'm not saying it's impossible to have a regression but we certainly need to isolate it. Could you please provision a 1.24.2, try to reproduce it, and if it doesn't happen upgrade to 1.24.6 and see if it happens then? Just upgrade k0s, don't upgrade the kernel or any OS package.
I've just probed several versions of k0s from 1.23 to 1.25.4 (using cni default provider, kube-router) and it fails in all k0s versions when a pod is created by a cronjob and a previous network policy exists but it works fine when a pod is created by a cronjob without a previous network policy.
It works fine in Mirantis Kubernetes Engine 3.4 (MKE uses calico).
It works fine in k0s when cni provider is calico so it's a kube-router's issue.
Well the fact that there isn't a regression is good.
The policies are applied just not fast enough which means that Calico, after certain scale, will have the same issue. In fact every networkPolicy implementation that I know behaves this way:
Now being honest, Kube-router applies the networkpolicies in a pretty naive way which can be optimized in many ways, and which compared to other implementations is just slow. There are a fair amount of optimizations that could be performed in the code (I'm saying it can be done, not that it's easy).
Now the questions are:
I've use a workaround, with an init container in cronjob. This init container resolve database service name in a loop before launch containers connecting to database. A unique execution of nslookup in init container is enough
The issue is marked as stale since no activity has been recorded in 30 days
@jsalgado78 I don't think there's anything we can do about this in k0s side. kube-router has somewhat naive way how it operates the iptables rules for NPC, which is know on their side too. There's one issue in kube-router side to improve the handling of the NPC rules, see https://github.com/cloudnativelabs/kube-router/issues/1372
I'm closing this in favour of tracking the kube-router NPC stuff upstream and as there's nothing (known) which we can do at k0s side.
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.24.6+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
Kubernetes cronjob cannot access database service in other worker node when network policy is used. It works fine if I run sleep 3 command at first or all pods are running in the same worker node.
I create a database pod and a database service, a network policy to only allow internal namespace traffic and a cronjob in the same namespace to test a valid connection to database but pods launched by cronjob fail without a sleep command. It appears to need a delay because of all iptables rules needed are not created at that moment.
Pods launched by cronjob work fine if network policy is not used or all pods are running in the same worker node
This is a yaml to probe it. I've probe this yaml running minikube (simulating 2 nodes) with kubernetes 1.24.6 and it works fine without a delay in cronjob but it fails on three K0s clusters complete-stack-example.txt
Steps to reproduce
$ kubectl logs mariadb-cronjob-27760917-tszxw -n test mysqld is alive
$ kubectl get cj mariadb-cronjob -n test -o yaml | grep -A2 containers: containers:
Expected behavior
Communication with database service from cronjobs should work without a delay
Actual behavior
A database pod and a pod created by a cronjob in other worker node can't connect without a sleep command when a network policy exists.
Screenshots and logs
iptables and firewalld services are disabled in all cluster nodes but iptables modules are loaded:
Additional context
No response