litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.43k stars 694 forks source link

Inherit tolerations spec to helper Pods like nodeSelector currently is #3461

Open erekgit opened 2 years ago

erekgit commented 2 years ago

"Erik Anderson Feb 18th at 3:00 PM Hello all, is there a way to cause not only the chaos-operator, but also from the chaos-engine for the helper pods orchestrated to inherit custom specs? we have an unique use case where nodeselector and also tolerations are required to be customized, but need a way for not only the chaos-engine of the experiment but also the orchestrated helper pods to inherit these. 2 replies Also sent to the channel

Erik Anderson 4 days ago https://docs.litmuschaos.io/docs/concepts/chaos-engine/ would it be this? ".spec.experiments[].spec.components.nodeSelector"

karthik.s_active_acc 1 day ago Hi @Erik Anderson. The helper pods are launched on the specific node(s) based on either (a) where the chaos target app reside (derived) - as in the case of pod network/stress chaos, or (b) based on node label or node name env i/p provided - as in the case of node stress chaos. So we would explicitly not need the nodeSelector to be provided/inherited. But I see the need for tolerations - if the target app is residing in OR node-label/name provided is that of a - tainted node. Could you please open a feature request/issue for this? cc: @shubham chaudhary @Udit Gaurav (edited)"

@uditgaurav @ispeakc0de

erekgit commented 2 years ago

@uditgaurav @ispeakc0de @ksatchit Update,

if we specify within the ChaosEngine experiment definition both:

".spec.components.runner.tolerations .spec.components.runner.nodeselector

and

.spec.experiments[].spec.components.tolerations .spec.experiments[].spec.components.nodeselector"

the expected inheritance works from the Chaos Runner to the Helper Pods. Is this as expected behaviour? Any need for this issue?

or should we consider the inheritance from the chaos-operator to the runner further?

yogeshboddu commented 1 year ago

Team, any update regarding the issue?

erekgit commented 1 year ago

Team, any update regarding the issue?

We've implemented a custom workaround of just hardcoding the PodSpecs for now, but would be awesome to see this a dynamically configurable option going forward.

The hardcoding is happening without litmusgo container/repo, we just hardcode our options and rebuild the image with a custom tag and use it where needed

yogeshboddu commented 1 year ago

runner and experiment pods are getting created considering the tolerations, but helper pods are not inheriting the property

pamle117 commented 1 year ago

We need this as well, When can we expect the solution ?

pamle117 commented 1 year ago

@erekgit can you please elaborate on your workaround, stuck due to this bug. TIA

erekgit commented 1 year ago

@erekgit can you please elaborate on your workaround, stuck due to this bug. TIA

@pamle117

For example the Network Loss experiment, we had to update the PodSpec to hardcode in our custom NodeSelector and Tolerations:

https://github.com/litmuschaos/litmus-go/blob/d4f9826ea98547d1700b8b12ce9d3b0b0249b6b0/chaoslib/litmus/network-chaos/lib/network-chaos.go#L211C1-L261

just add them in, rebuild this container and tag it for your own use. then override the LIB environment variable in the experiment definition and i believe in the experimental YAML so the helper spins up with your custom configurations.

pamle117 commented 1 year ago

@erekgit I have my target app on Windows OS nodes in K8 cluster and all my litmus components are on LinuxVMs on same cluster. Since it spins up POD on windows node I have not been able to use it. Did you try to spin up the Chaos POD on another node other than target application node ?

erekgit commented 1 year ago

@erekgit I have my target app on Windows OS nodes in K8 cluster and all my litmus components are on LinuxVMs on same cluster. Since it spins up POD on windows node I have not been able to use it. Did you try to spin up the Chaos POD on another node other than target application node ?

All Linux nodes