Chaos Experiments can not run using Litmus ChaosCenter v3.0.0

litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

https://litmuschaos.io

Apache License 2.0

4.43k stars 694 forks source link

Chaos Experiments can not run using Litmus ChaosCenter v3.0.0 #4246

Open jitendrapal-ngr opened 1 year ago

jitendrapal-ngr commented 1 year ago

What happened: Want to run Chaos Experiment using Litmus ChaosCenter UI v3.0.0 but unable to run it due to plan with queued Status, which never changing. It's Kind of deadlock situation which never getting back or any other Status changing.

What you expected to happen: Plan/Experiment to get Executed!!

Where can this issue be corrected? (optional) Not aware of on fixing this but getting the error on my local machine using minikube

How to reproduce it (as minimally and precisely as possible): I am using Kali Linux using WSL2 on Windows 11 and minikube on top of that with docker driver, but it cannot be run.

Anything else we need to know?: However through CLI the experiment are running but not from Litmus ChaosCenter UI v3.0.0. also all the setup has been done.

Pls do let me know, if anybody has experienced the same. Although it seems a bug to me, which needed to be fixed!!

jitendrapal-ngr commented 1 year ago

Update: these are the subscriber pod logs to track workflow stage of the experiment,

level=error msg="Error on processing request" error="error performing infra operation: Workflow.argoproj.io \"demo-pod-delete-1698390162553\" is invalid: metadata.labels: Invalid value: \"{{workflow.parameters.appNamespace}}_kube-proxy\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"

The Status is now changing to Pending after hardcoding the values, but it still stuck.

dwdraju commented 1 year ago

I am on same boat and have to manually apply the yaml by changing the value of metadata.labels.subject.

jitendrapal-ngr commented 1 year ago

I am on same boat and have to manually apply the yaml by changing the value of metadata.labels.subject.

Hi @dwdraju , I already tried the same, but no response, the status changed from queued to Pending,

I removed the var reference and hardcoded them, also not getting any error in subscriber logs and workflow getting created, but its in deadlock pending state
I even tried to remove metadata.labels.subject but not working though

can you pls expand your answer to troubleshoot or perform experiment by modify yaml?

dwdraju commented 1 year ago

Sure. I downloaded the manifest yml file and just replaced {{workflow.parameters.appNamespace}} on subject label with some random string and applied the file with kubectl. The experiment executed as expected. But have to do this for every experiment. @jitendrapal-ngr

jitendrapal-ngr commented 1 year ago

Hi Here, Update: I found this is a bug in Litmus ChaosCenter v3.0.0 which throws this issue, because:

It cannot parse {{workflow.parameters.appNamespace}} for subject in metadata
along with there are some field which dynamically cannot be parsed from probe config to main experiment config, and keeping those fields default leads to failure of experiments.
Some more these types of issues are there, varying for different experiments.

To Run this experiment successfully, I had to replace {{workflow.parameters.appNamespace}} on subject label with some random string and update appNamespaces inside arguements to targetNamespace, which by default set as kube-proxy even post updation from UI.

jitendrapal-ngr commented 1 year ago

@here @litmuschaos Important!! I would like to request to litmuschaos team to look for this Issue and update the solution for the same, because it behaving as failure entrypoint to perform any experiments!!

vanshBhatia-A4k9 commented 1 year ago

@neelanjan00 this seems to be a genuine bug, can you PTAL

dwdraju commented 11 months ago

This PR fixes the issue related with workflow label: https://github.com/litmuschaos/chaos-charts/pull/618

rksubash commented 9 months ago

I lost my time with the version 3.0 for quite some time and its frustrating the issue is still there and all the experiments are in queue

francoischenu commented 9 months ago

Hello,

I had the same issue as everyone

@rksubash please find bellow my workaround

After some investigation, the issue seem related to the template part of litmus (as we can see in the PR of @dwdraju) .

I'm assuming you are trying to run a scenario pod delete or cpu/memory hog from the template list.

Instead, you can try to start from a Blank canvas and add manually your experimentation.

it's working perfectly for me.

If that can help while waiting for the PR to merge.

Best Regards

StyleTang commented 8 months ago

Hi Team, Same issue +1, it seems a lot of the new users be impacted by this. Any one can help? Thanks!

neelanjan00 commented 8 months ago

Hi folks thanks for reporting this issue. We're still looking into it.

juwatanabe commented 4 months ago

Seeing the same issue. Seems like this project doesn't have much support. This bug reported 8 months ago.

dvdklnr commented 4 months ago

Same here, is there any hope? I tried and retried multiple ties but still stuck.

dwdraju commented 4 months ago

@juwatanabe @dvdklnr, did you try with the changes in this PR? https://github.com/litmuschaos/chaos-charts/pull/618

dvdklnr commented 4 months ago

@juwatanabe @dvdklnr, did you try with the changes in this PR? litmuschaos/chaos-charts#618

Thank you, I manually edited the workflow and got through this particular stumbling block (now fighting the app's probe details)

juwatanabe commented 4 months ago

@juwatanabe @dvdklnr, did you try with the changes in this PR? litmuschaos/chaos-charts#618

I ended up not using the templating function as a workaround. Overall seems like Litmus has major UI test gaps around basic functionality that are giving me pause for choosing it as a chaos framework. There are kernels of good things here, but the stability is not meeting a certain bar.

RaulButuc commented 2 months ago

Hi folks thanks for reporting this issue. We're still looking into it.

How much longer are you looking into it? It's been a long time since this was reported!

bingwei-hong-partior commented 1 month ago

this issue practically renders the experiments via GUI useless without manual changes. Any updates if this can be fixed?

Baalekshan commented 1 month ago

Hi all, https://github.com/litmuschaos/litmus/pull/4856 this PR fixes the issue now you can delete your queued experiments or rerun them