Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
What happened:
I have a pod with one main container and three sidecars. When network loss is applied to the custom sidecar container and the destination host, all network connectivity in the pod is lost.
What you expected to happen:
I expect traffic loss to affect only connections from the targeted sidecar container to my otel-collector K8s service
How to reproduce it (as minimally and precisely as possible):
this is env values for this experiment:
So I expected that the connection from the container otel-agent to dev-collector.otel-collector.svc.cluster.local would be disabled, but all other connections from all pods to any endpoint would be enabled. However, when this experiment is running, every connection from all pods is disabled, causing the readiness probe to fail.
When I investigated how this experiment works, I realized that this command is applied:
sudo nsenter -t 561580 -n tc qdisc replace dev eth0 root handle 1: prior
sudo nsenter -t 561580 -n tc qdisc replace dev eth0 parent 1:3 netem loss 100
sudo nsenter -t 561580 -n tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 10.0.30.140 flowid 1:3
It looks like only the connection to 10.0.30.140 is closed, that is correct.
But in real experiment every connection outside of pod is disabled. For example, sidecar with proxysql container is not allowed to connect to databasse.
Anything else we need to know?:
I run this experiment on AKS cluster.
Kubernetes version: 1.29.2
Limus helm targetRevision: 3.8.0.
Manifest of the experiment is attached.
network-loss.zip
What happened: I have a pod with one main container and three sidecars. When network loss is applied to the custom sidecar container and the destination host, all network connectivity in the pod is lost.
What you expected to happen: I expect traffic loss to affect only connections from the targeted sidecar container to my otel-collector K8s service
How to reproduce it (as minimally and precisely as possible): this is env values for this experiment:
So I expected that the connection from the container otel-agent to dev-collector.otel-collector.svc.cluster.local would be disabled, but all other connections from all pods to any endpoint would be enabled. However, when this experiment is running, every connection from all pods is disabled, causing the readiness probe to fail.
When I investigated how this experiment works, I realized that this command is applied:
It looks like only the connection to 10.0.30.140 is closed, that is correct. But in real experiment every connection outside of pod is disabled. For example, sidecar with proxysql container is not allowed to connect to databasse.
Anything else we need to know?: I run this experiment on AKS cluster. Kubernetes version: 1.29.2 Limus helm targetRevision: 3.8.0. Manifest of the experiment is attached.