Closed smitthakkar96 closed 8 months ago
Hello, I have encountered the same issue.
Can you try with steps described here: https://kubernetes.slack.com/archives/CNXNB0ZTN/p1697011377377159
With the release of litmus 3.0.0, attaching probes to an experiment is a mandatory step, thanks for reporting the missing information in the docs, we will add it here soon: https://docs.litmuschaos.io/docs/concepts/probes.
Thanks!
@vanshBhatia-A4k9 There can be valid cases where someone might not want to have any probes but wants to monitor the experiment manually. What is the rationale behind making it mandatory? Is there any RFC about it?
Also, as an end user, the error message is very unhelpful. Can we improve it? It doesn't clearly communicate that at least one probe is needed.
To add, even if I add the probe, the experiments started failing because it required an integer value for the fields spec.experiments[0].spec.probe[0].runProperties.probeTimeout
and spec.experiments[0].spec.probe[0].runProperties.interval
whereas from the UI won't let you save without 's'.
.
The error message from the podlog is: Error Creating Resource : ChaosEngine.litmuschaos.io "run-chaosab123" is invalid: [spec.experiments[0].spec.probe[0].runProperties.probeTimeout: Invalid value: "string": spec.experiments[0].spec.probe[0].runProperties.probeTimeout in body must be of type integer: "string", spec.experiments[0].spec.probe[0].runProperties.interval: Invalid value: "string": spec.experiments[0].spec.probe[0].runProperties.interval in body must be of type integer: "string"]
I tried the experiment in a fresh cluster. Mentioning this, to indicate that there was no residual CRD's in the cluster. According to this PR: https://github.com/litmuschaos/litmus-docs/pull/244. Old CRD's can be a potential reason for this error. But in my case the cluster was fresh.
However, I made manual changes to the experiment yaml to set annotations filed as null that is annotations:
instead of annotations: {}
. Found the solution in @Nageshbansal slack comment. But, the issue with such 'workaround' is that the annotation filed might always not be null. We might need to fill it up with other info. In such cases we will encounter the above-mentioned issue again.
This is a huge stopper for me as well, in beta 8 and below we just defined the probe in stream and everything worked.
The error message definitely needs improvement - we will take that feedback cc: @Saranya-jena
On why the probe is mandatory:
With 3.0, resilience score is purely based on "probe success/failure". We no longer associate success/failure with faults and exp themselves. They only show execution status - queued/running/completed/stopped etc., This was based on user feedback, the gist of which - fault injection/experiment execution is just an action and resilience should be measured purely on what is "validated".
Since we are looking to project the RS as the main outcome of an experiment (it is the main actionable entity for many users - who decide to take steps based on the value) and the RS is in turn dependent on existence of probes. This led to us to current flow which mandates the probes.
Having said that, what we would need in the current circumstances, is the support for default or "system" probes which will be auto-configured for faults - w/o users having to explicitly create them. Thereby ensuring there is no "additional" action/input required from the users while creating the experiments. We can add this to the short-term roadmap.
Sounds good thanks for explanation @ksatchit
What happened:
When trying to create an experiment without probes, we see the following error.
It's not clear in the docs if probes are mandatory for every fault. Also, the error message is also not very friendly.
What you expected to happen:
We should be able to inject faults without configuring probes. It adds a high bar of adoption for people starting out with their chaos journey to always to have to configure probes. Also, depending on the hypotheses, teams can choose not to have any automated checks but monitor the experiment manually by simply watching their dashboards. If something goes wrong, they can halt the experiment.
How to reproduce it (as minimally and precisely as possible):
Try creating an experiment without any probes in
v3.0.0
. Here is the manifest for quick reference