litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.43k stars 694 forks source link

Network chaos could be orphaned if the helper pods are force killed before grace period #2359

Open hochuenw-dd opened 3 years ago

hochuenw-dd commented 3 years ago

What happened: Injected chaos could be orphaned in the cluster if the network helper pods are killed before graceful completion. This is because internally the network experiment uses tc twice to inject the latency at the beginning and revert it at the end. If the helper pod is forcedly killed unexpectedly for whatever reason, the impact of the first tc command would be there in the cluster forever unless you restart the affected pod.

What you expected to happen: After the chaos duration, the injected chaos should always be reverted.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: The container runtime is docker

ksatchit commented 3 years ago

Hey @hochuenw-dd . Thank you for highlighting this requirement. It is definitely an important one and will be prioritized.