litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.44k stars 697 forks source link

Chaos experiment fault status "RUNNING" skipped #4837

Open ParkChangSun opened 3 months ago

ParkChangSun commented 3 months ago

What happened:

I executed a simple chaos experiment with fault 'pod-delete'. Once the status of the fault is in "INITIALIZED", it does not go to "RUNNING" status, it goes to "COMPLETED" directly.

Once subscriber get Argo Workflow changes, all nodes are updated. But subscriber is not informed the changes in ChaosEngine CR/ChaosResult CR. The phase of fault node depends on ChaosResult CR if it exists, so the phase of fault node would not be updated until the experiment is finished. ChaosResult CR is null when chaos-runner pod initialized, so the fault nodes are in "initialized" state.

What you expected to happen:

After initializing process done, it should display "RUNNING" state.

Where can this issue be corrected? (optional)

How to reproduce it (as minimally and precisely as possible):

Start a simple chaos experiment. ('pod-delete')

Anything else we need to know?:

I will take this issue.

Fault node is "Initialized" until "Completed" Screenshot_20240818_145958 Screenshot_20240818_150013

When fault node completed/Argo Workflow CR changed Screenshot_20240908_144708