Open sebay opened 9 months ago
@Jonsy13 PTAL
I'm seeing this issue as well. I'm wondering if I should just go back to version 2.x rather than fight 3.0 behavior.
Hi @sebay @dvdklnr,
Thanks for trying 3.x ChaosCenter. There are few changes done as part of 3.x version -
install-chaos-experiments
is renamed to install-chaos-faults
In the provided manifest, There are 3 steps which are free flow steps and not Chaos fault step
- - name: check-ocp-prometheus-is-up
template: check-ocp-prometheus-is-up
arguments: {}
- - name: generate-traffic
template: generate-traffic
arguments: {}
- - name: stop-traffic
template: stop-traffic
arguments: {}
Since these steps are not chaos fault steps, you cannot use them. If you remove them and trying using same experiment in 3.x, it will work. I tested with your experiment itself.
On the error description, Yes this can be enhanced better. @hrishavjha can check that!
I tried converting the previous workflow manifest to a version with probe annotation and now I'm getting a "failure to unmarshal chaosengine" error when trying to run the experiment.
Speaking of probe annotations - I can't find these k8s probe objects , is there a way to directly edit the probe
Here is the latest manifest:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: gke-az-rollout-health-check
namespace: litmus
labels:
workflow_name: "gke-az-rollout-health-check"
subject: "gke-az-rollout-health-check_litmus"
annotations:
categories: "gcp,availability"
definition: "gke-az-chaos-rollout-health"
vendor: "CNCF"
spec:
arguments:
parameters:
- name: TOTAL_CHAOS_DURATION
value: "300"
- name: GCP_PROJECT_ID
value: "company-service-staging"
- name: REGION
value: "us-west1"
- name: ZONE
value: "us-west1-c"
- name: ROLLOUT_NAMESPACE
value: "company-staging-ns"
- name: ROLLOUT_NAME
value: "atlantis"
entrypoint: gke-az-chaos
serviceAccountName: argo-chaos
templates:
- name: gke-az-chaos
steps:
- - name: install-chaos-faults
template: install-chaos-faults
- - name: gcp-az-chaos
template: gcp-az-chaos
- - name: revert-chaos
template: revert-chaos
- name: install-chaos-faults
inputs:
artifacts:
- name: gcp-az-chaos
path: /tmp/gcp-az-chaos.yaml
raw:
data: |
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosExperiment
metadata:
name: gcp-az-chaos
namespace: {{workflow.namespace}}
spec:
definition:
scope: cluster
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","list","patch","update"]
image: "litmuschaos/go-runner:latest"
args:
- -c
- ./experiments -name gcp-az-chaos
command:
- /bin/bash
env:
- name: TOTAL_CHAOS_DURATION
value: '{{workflow.parameters.TOTAL_CHAOS_DURATION}}'
- name: CLOUD_PROVIDER
value: 'gcp'
- name: PROJECT_ID
value: '{{workflow.parameters.GCP_PROJECT_ID}}'
- name: REGION
value: '{{workflow.parameters.REGION}}'
- name: ZONE
value: '{{workflow.parameters.ZONE}}'
- name: ZONE_SELECTION
value: 'single_zone'
labels:
name: gcp-az-chaos
container:
image: litmuschaos/k8s:latest
command: [sh, -c]
args:
["kubectl apply -f /tmp/gcp-az-chaos.yaml -n {{workflow.namespace}}"]
- name: gcp-az-chaos
inputs:
artifacts:
- name: gcp-az-chaos
path: /tmp/chaosengine-gcp-az-chaos.yaml
raw:
data: |
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
namespace: {{workflow.namespace}}
labels:
context: "{{workflow.parameters.appNamespace}}_kube-proxy"
workflow_run_id: "{{ workflow.uid }}"
workflow_name: gcp-az-chaos
annotations:
probeRef: '[{"name":"atlantis-health","mode":"SOT"}]'
generateName: gcp-az-chaos
spec:
engineState: 'active'
annotationCheck: 'false'
appinfo:
appns: '{{workflow.parameters.ROLLOUT_NAMESPACE}}'
applabel: 'app={{workflow.parameters.ROLLOUT_NAME}}'
appkind: 'rollout'
chaosServiceAccount: argo-chaos
experiments:
- name: gcp-az-chaos
spec:
components:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
- name: CHAOS_INTERVAL
value: "10"
- name: FORCE
value: "false"
container:
image: litmuschaos/litmus-checker:latest
args:
- -file=/tmp/chaosengine-gcp-az-chaos.yaml
- -saveName=/tmp/engine-name
- name: revert-chaos
container:
image: litmuschaos/k8s:latest
command: [sh, -c]
args:
- "kubectl delete chaosengine -l 'workflow_run_id={{workflow.uid}}' -n {{workflow.namespace}}"
edit: updated markup
What happened: When I try to create a new Experiment from an existing 2.14.0 yaml I get an error without any detail except: "errorInYamlDescription". I am also unable to edit or run the experiment.
What you expected to happen: It should be possible to import existing 2.14.0 experiment
Where can this issue be corrected? (optional)
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: