litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.43k stars 694 forks source link

ChaosEngine Event ChaosInjected not getting Reflected in GetExperimentRun API #4906

Open nateftv opened 1 month ago

nateftv commented 1 month ago

What happened: ChaosEngine Event ChaosInjected does not get Reflected in GetExperimentRun API. This netem (network-latency) experiment generates this ChaosInject event when fault is injected. However, I don't see this event being sent by this SendWorkflowUpdates function (by using additional logging locally), also phase and message doesn't get reflected in the GetExperimentRun API. Right now, the example below of ChaosEngine executionData excerpt doesn't contain anything about ChaosInject after fault has been actually injected.

(Truncated)

{\"name\":\"pod-network-loss-1kj\",\"phase\":\"initialized\",\"message\":\"\",\"startedAt\":\"1726772307\",\"finishedAt\":\"\",\"children\":null,\"type\":\"ChaosEngine\",\"chaosData\":{\"engineUID\":\"5467912e-c942-49ec-8754-3fceb552242e\",\"engineContext\":\"\",\"engineName\":\"pod-network-loss-1kjktwcs\",\"namespace\":\"chaos-test-namespace\",\"experimentName\":\"pod-network-loss\",\"experimentStatus\":\"initialized\",\"lastUpdatedAt\":\"1726772335\",\"experimentVerdict\":\"N/A\",\"experimentPod\":\"Yet to be launched\",\"runnerPod\":\"pod-network-loss-1kjktwcs-runner\",\"probeSuccessPercentage\":\"0\",\"failStep\":\"\",\"chaosResult\":null}}},\"updatedBy\":\"YWRtaW4\"}"

What you expected to happen: After ChaosInjected event emitted, the GetExperimentRun's executionData for ChaosEngine type should reflect the message and phase accordingly. For example, at least the message contains

(Truncated)

{\"name\":\"pod-network-loss-1kj\",\"phase\":\"ChaosInject\",\"message\":\"Injected pod-network-loss-experiment chaos on application pods\",\"startedAt\":\"1726772307\",\"finishedAt\":\"\",\"children\":null,\"type\":\"ChaosEngine\",\"chaosData\":{\"engineUID\":\"5467912e-c942-49ec-8754-3fceb552242e\",\"engineContext\":\"\",\"engineName\":\"pod-network-loss-1kjktwcs\",\"namespace\":\"chaos-test-namespace\",\"experimentName\":\"pod-network-loss\",\"experimentStatus\":\"initialized\",\"lastUpdatedAt\":\"1726772335\",\"experimentVerdict\":\"N/A\",\"experimentPod\":\"Yet to be launched\",\"runnerPod\":\"pod-network-loss-1kjktwcs-runner\",\"probeSuccessPercentage\":\"0\",\"failStep\":\"\",\"chaosResult\":null}}},\"updatedBy\":\"YWRtaW4\"}"

Where can this issue be corrected? (optional)

How to reproduce it (as minimally and precisely as possible): I can reproduce on v3.9 and 3.10 by launching a simple network-loss/latency experiment and querying the GetExperimentRun API after actual fault is injected (helper pod is running)

Anything else we need to know?: experimentStatus also doesn't seem to be very consistent, for example, sometimes, after fault is injected, the experimentStatus is Initialized, sometimes is Running and sometimes is empty (when sleep 1s after install-chaos-fault, not sure how that 's related).

GayatriVasudevan commented 3 weeks ago

+1 We are having the same issue. We would like to report to users of chaos experiment when the status changes to "chaos injected", but unable to do so as this info is not available via graphQL.