netobserv / network-observability-operator

An OpenShift / Kubernetes operator for network observability
Apache License 2.0
155 stars 24 forks source link

Pods fail to start with annotation "kubernetes.io/egress-bandwidth" #791

Open paulben opened 3 days ago

paulben commented 3 days ago

Pods with annotation kubernetes.io/egress-bandwidth: 10M fail to start with Network Observability Operator 1.6.2 installed. Pod events show:

...failed to create pod network sandbox k8s_php-sample-6cfff549d-7fvw5_mywebapp_88fa15ea-5251-4931-99f0-9c021f2f34a9_0(ebbdf6643f2ad7cf4b6cd0c82f7008db13219987206fb54d46355865b6e7aeda): error adding pod mywebapp_php-sample-6cfff549d-7fvw5 to CNI network "multus-cni-network"...

Which raises the question: Are there OS requirements for nodes?

The above failure occurs on OpenShift 4.14.34 with (AMD64) nodes at:

sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.10 (Ootpa)
sh-4.4# uname -a
Linux kube-cotssgfw0jdq7e85d7sg-lsprototype-default-000002a3 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Mar 14 14:20:09 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4# 

The failure does not occur on OpenShift 4.14.27 with nodes at:

sh-4.4# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.6 (Ootpa)
sh-4.4# uname -a
Linux worker0.paul-network-metrics.cp.fyre.ibm.com 5.14.0-284.66.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 6 14:51:27 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4#
jotak commented 2 days ago

Hi @paulben ,

Thanks for reporting this issue. Do you know which CNI is implementing this rate limiting annotation? Is it calico? Asking because we've already been made aware of a limitation when a similar annotation was used with Calico while netobserv is used - there is a conflict with the eBPF programs. As far as I can tell, the program loaded by netobserv should support chaining with other BPF programs, but that might not be the case of the other one that is loaded. We might also need to ask collaboration with the folk maintaining this upstream, if this is what I suspect.

cc @msherif1234 - we need to see if we must create an issue upstream in containernetworking.

jotak commented 2 days ago

@paulben do the 2 clusters that you mention have a similar network configuration regarding CNIs / multus?

paulben commented 2 days ago

@jotak On the failing cluster:

$ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
Calico

On the "working" cluster:

$ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
OVNKubernetes

I'm not sure how to get further cni/multus config. Can you advise?