litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.44k stars 698 forks source link

ChaosCenter portal doesn't display experiments' Chaos Results #3847

Open dsanchez31 opened 1 year ago

dsanchez31 commented 1 year ago

What happened: I've installed Litmuschaos from litmus-helm repo v2.15.1 at cluster scope on a k8s cluster (v 1.24.4) with ChaosHub (branch v2.14.x) I run generic/pod_delete on another namespace. It works but (most of the time) I can't view Chaos Results and Reliability Details (Overall RR always 0% and Experiments Passed : 0/1) while "kubectl describe chaosresult -n litmus" displays expected results.

What you expected to happen: I would like to see Chaos Results and Reliability Details in ChaosCenter as there are stored in chaosresult k8s objects".

How to reproduce it (as minimally and precisely as possible): Install Litmuschaos v2.15.1 and run some experiments (generic/pod_delete for example).

Thank you for your help

dsanchez31 commented 1 year ago

Here, logs of the subscriber: When requesting from the UI the logs of an experiment missing results:

time="2022-12-06T14:18:34Z" level=info msg="Log Request: {\"clusterID\":\"a4516330-82da-4b2b-8b51-f723bcea11a1\",\"workflowRunID\":\"539e9c8d-7f95-4930-b982-75a3ba43e3a1\",\"podName\":\"user-requests-simuator-delete-probes-1670334001-683934398\",\"podNamespace\":\"litmus\",\"podType\":\"ChaosEngine\",\"expPod\":\"\",\"runnerPod\":\"\",\"chaosNamespace\":\"\"}"

time="2022-12-06T14:18:34Z" level=error msg="Failed to get experiment pod  logs, err: resource name may not be empty"

time="2022-12-06T14:18:34Z" level=error msg="Failed to get runner pod  logs, err: resource name may not be empty"

time="2022-12-06T14:18:34Z" level=info msg="Response from the server: {\"data\":{\"podLog\":\"LOGS SENT SUCCESSFULLY\"}}"

When requesting from the UI the logs of an experiment with valid results displayed in the ui:

time="2022-12-06T14:18:48Z" level=info msg="Log Request: {\"clusterID\":\"a4516330-82da-4b2b-8b51-f723bcea11a1\",\"workflowRunID\":\"55469e9c-dbd0-4d14-be17-a4a28a755a97\",\"podName\":\"user-requests-simulation-delete-1670333600-850579342\",\"podNamespace\":\"litmus\",\"podType\":\"ChaosEngine\",\"expPod\":\"\",\"runnerPod\":\"\",\"chaosNamespace\":\"\"}"

time="2022-12-06T14:18:48Z" level=error msg="Failed to get experiment pod  logs, err: resource name may not be empty"

time="2022-12-06T14:18:48Z" level=error msg="Failed to get runner pod  logs, err: resource name may not be empty"

time="2022-12-06T14:18:48Z" level=info msg="Response from the server: {\"data\":{\"podLog\":\"LOG REQUEST CANCELLED\"}}"

time="2022-12-06T14:18:48Z" level=info msg="Log Request: {\"clusterID\":\"a4516330-82da-4b2b-8b51-f723bcea11a1\",\"workflowRunID\":\"55469e9c-dbd0-4d14-be17-a4a28a755a97\",\"podName\":\"user-requests-simulation-delete-1670333600-850579342\",\"podNamespace\":\"litmus\",\"podType\":\"ChaosEngine\",\"expPod\":\"pod-delete-z3ualy-rprbv\",\"runnerPod\":\"pod-delete-kwjmfp69-runner\",\"chaosNamespace\":\"litmus\"}"

time="2022-12-06T14:18:48Z" level=info msg="Response from the server: {\"data\":{\"podLog\":\"LOGS SENT SUCCESSFULLY\"}}"

Here the last logs of chaos-exporter that indicates valid chaosresults but not displayed in the ui:


time="2022-12-06T13:40:11Z" level=info msg="The chaos metrics are as follows" EndTime=1.670333954e+09 StartTime=1.67033405e+09 ProbeSuccessPercentage=100 ChaosInjectTime=1670333928 ResultNamespace=litmus ResultVerdict=Pass TotalDuration=0 ResultName=pod-delete-xgpb62cz-pod-delete AwaitedExperiments=0 PassedExperiments=1 FailedExperiments=0

time="2022-12-06T13:40:12Z" level=info msg="The chaos metrics are as follows" ResultVerdict=Pass EndTime=1.670333954e+09 ChaosInjectTime=1670333928 FailedExperiments=0 TotalDuration=0 PassedExperiments=1 ResultNamespace=litmus AwaitedExperiments=0 ProbeSuccessPercentage=100 ResultName=pod-delete-xgpb62cz-pod-delete StartTime=1.67033405e+09

time="2022-12-06T13:40:13Z" level=info msg="The chaos metrics are as follows" ResultVerdict=Pass AwaitedExperiments=0 PassedExperiments=1 EndTime=1.670333954e+09 TotalDuration=0 ResultNamespace=litmus StartTime=1.67033405e+09 ResultName=pod-delete-xgpb62cz-pod-delete FailedExperiments=0 ProbeSuccessPercentage=100 ChaosInjectTime=1670333928

time="2022-12-06T13:40:14Z" level=info msg="[Wait]: Hold on, no active chaosengine found ... "
dsanchez31 commented 1 year ago

Hi, The issue seems to occur when the k8s cluster is overloaded...

amityt commented 1 year ago

Hi @dsanchez31 We are unable to re-create this issue, we installed litmus-helm repo v2.15.1 as you have mentioned.

Hi, The issue seems to occur when the k8s cluster is overloaded...

Can you please tell your cluster resources? We will try to create the same env and test on it.

swarna1101 commented 1 year ago
  1. Check the versions of Litmus Chaos and ChaosHub: Make sure that you are using compatible versions of Litmus Chaos and ChaosHub. The version of Litmus Chaos and ChaosHub should match, or at least be close in version number.
  2. Check the status of the ChaosHub deployment: You can check the status of the ChaosHub deployment by running the following command: kubectl get pods -n litmus
  3. Check the connectivity between Litmus Chaos and ChaosHub: Make sure that Litmus Chaos and ChaosHub can communicate with each other. You can check the logs for the Litmus Chaos and ChaosHub pods to see if there are any connectivity issues.

4.Check the configuration of the ChaosHub deployment: Make sure that the ChaosHub deployment is configured correctly. You can check the values in the chaos-hub-values.yaml file to ensure that the configuration is correct.

pgmrey commented 1 year ago

Hi, this is still happening when using custom defined manifests (if using chaosCenter UI it works), did you find any solution?

version: beta3 scope: namespaced