litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.43k stars 693 forks source link

Basic container-kill is not working with YAML created through UI in 3.4.0 #4482

Open sebay opened 8 months ago

sebay commented 8 months ago

What happened: Trying to get a basic container kill to work without success with Yaml created from 3.4.0 UI. Trying to save a fixed Yaml will not let me save it.

Logs: {"mainLogs":"W0304 02:03:16.394831 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.\n2024/03/04 02:03:16 Error Creating Resource : ChaosEngine.litmuschaos.io 'container-kill-mcilxqr5' is invalid: [spec.experiments[0].spec.probe[0].runProperties.interval: Invalid value: 'string': spec.experiments[0].spec.probe[0].runProperties.interval in body must be of type integer: 'string', spec.experiments[0].spec.probe[0].runProperties.probePollingInterval: Invalid value: 'string': spec.experiments[0].spec.probe[0].runProperties.probePollingInterval in body must be of type integer: 'string', spec.experiments[0].spec.probe[0].runProperties.probeTimeout: Invalid value: 'string': spec.experiments[0].spec.probe[0].runProperties.probeTimeout in body must be of type integer: 'string']\n"}

What you expected to happen: It should kill container Where can this issue be corrected? (optional)

UI or backend or both. How to reproduce it (as minimally and precisely as possible): Workflow generated from UI using cmd probe

kind: Workflow
apiVersion: argoproj.io/v1alpha1
metadata:
  name: test-kill
  namespace: application
  labels:
    infra_id: b15c80b3-311a-4e1e-b071-42201aa4f765
    revision_id: 365a2fac-b4c7-48e9-ae22-b5af0701d678
    workflow_id: 7db33825-6073-478d-b109-7edb45551cdb
    workflows.argoproj.io/controller-instanceid: b15c80b3-311a-4e1e-b071-42201aa4f765
spec:
  templates:
    - name: test-kill
      inputs: {}
      outputs: {}
      metadata: {}
      steps:
        - - name: install-chaos-faults
            template: install-chaos-faults
            arguments: {}
        - - name: container-kill-mci
            template: container-kill-mci
            arguments: {}
        - - name: cleanup-chaos-resources
            template: cleanup-chaos-resources
            arguments: {}
    - name: install-chaos-faults
      inputs:
        artifacts:
          - name: container-kill-mci
            path: /tmp/container-kill-mci.yaml
            raw:
              data: >
                apiVersion: litmuschaos.io/v1alpha1

                description:
                  message: |
                    Kills a container belonging to an application pod 
                kind: ChaosExperiment

                metadata:
                  name: container-kill
                  labels:
                    name: container-kill
                    app.kubernetes.io/part-of: litmus
                    app.kubernetes.io/component: chaosexperiment
                    app.kubernetes.io/version: ci
                spec:
                  definition:
                    scope: Namespaced
                    permissions:
                      - apiGroups:
                          - ""
                        resources:
                          - pods
                        verbs:
                          - create
                          - delete
                          - get
                          - list
                          - patch
                          - update
                          - deletecollection
                      - apiGroups:
                          - ""
                        resources:
                          - events
                        verbs:
                          - create
                          - get
                          - list
                          - patch
                          - update
                      - apiGroups:
                          - ""
                        resources:
                          - configmaps
                        verbs:
                          - get
                          - list
                      - apiGroups:
                          - ""
                        resources:
                          - pods/log
                        verbs:
                          - get
                          - list
                          - watch
                      - apiGroups:
                          - ""
                        resources:
                          - pods/exec
                        verbs:
                          - get
                          - list
                          - create
                      - apiGroups:
                          - apps
                        resources:
                          - deployments
                          - statefulsets
                          - replicasets
                          - daemonsets
                        verbs:
                          - list
                          - get
                      - apiGroups:
                          - apps.openshift.io
                        resources:
                          - deploymentconfigs
                        verbs:
                          - list
                          - get
                      - apiGroups:
                          - ""
                        resources:
                          - replicationcontrollers
                        verbs:
                          - get
                          - list
                      - apiGroups:
                          - argoproj.io
                        resources:
                          - rollouts
                        verbs:
                          - list
                          - get
                      - apiGroups:
                          - batch
                        resources:
                          - jobs
                        verbs:
                          - create
                          - list
                          - get
                          - delete
                          - deletecollection
                      - apiGroups:
                          - litmuschaos.io
                        resources:
                          - chaosengines
                          - chaosexperiments
                          - chaosresults
                        verbs:
                          - create
                          - list
                          - get
                          - patch
                          - update
                          - delete
                    image: xxx/litmuschaos/go-runner:latest
                    imagePullPolicy: Always
                    args:
                      - -c
                      - ./experiments -name container-kill
                    command:
                      - /bin/bash
                    env:
                      - name: TARGET_CONTAINER
                        value: ""
                      - name: RAMP_TIME
                        value: ""
                      - name: TARGET_PODS
                        value: ""
                      - name: CHAOS_INTERVAL
                        value: "10"
                      - name: SIGNAL
                        value: SIGKILL
                      - name: SOCKET_PATH
                        value: /run/containerd/containerd.sock
                      - name: CONTAINER_RUNTIME
                        value: containerd
                      - name: TOTAL_CHAOS_DURATION
                        value: "20"
                      - name: PODS_AFFECTED_PERC
                        value: ""
                      - name: NODE_LABEL
                        value: ""
                      - name: DEFAULT_HEALTH_CHECK
                        value: "false"
                      - name: LIB_IMAGE
                        value: xxx/litmuschaos/go-runner:latest
                      - name: SEQUENCE
                        value: parallel
                    labels:
                      name: container-kill
                      app.kubernetes.io/part-of: litmus
                      app.kubernetes.io/component: experiment-job
                      app.kubernetes.io/runtime-api-usage: "true"
                      app.kubernetes.io/version: ci
      outputs: {}
      metadata: {}
      container:
        name: ""
        image: xxx/litmuschaos/k8s:2.11.0
        command:
          - sh
          - -c
        args:
          - kubectl apply -f /tmp/ -n {{workflow.parameters.adminModeNamespace}}
            && sleep 30
        resources: {}
    - name: cleanup-chaos-resources
      inputs: {}
      outputs: {}
      metadata: {}
      container:
        name: ""
        image: xxx/litmuschaos/k8s:2.11.0
        command:
          - sh
          - -c
        args:
          - kubectl delete chaosengine -l workflow_run_id={{workflow.uid}} -n
            {{workflow.parameters.adminModeNamespace}}
        resources: {}
    - name: container-kill-mci
      inputs:
        artifacts:
          - name: container-kill-mci
            path: /tmp/chaosengine-container-kill-mci.yaml
            raw:
              data: >
                apiVersion: litmuschaos.io/v1alpha1

                kind: ChaosEngine

                metadata:
                  namespace: "{{workflow.parameters.adminModeNamespace}}"
                  labels:
                    workflow_run_id: "{{ workflow.uid }}"
                    workflow_name: test-kill
                  annotations:
                    probeRef: '[{"name":"cmd","mode":"OnChaos"}]'
                  generateName: container-kill-mci
                spec:
                  engineState: active
                  appinfo:
                    appns: application
                    applabel: app.kubernetes.io/managed-by=solace-pubsubplus-operator
                    appkind: statefulset
                  chaosServiceAccount: litmus-admin
                  experiments:
                    - name: container-kill
                      spec:
                        components:
                          env:
                            - name: TARGET_CONTAINER
                              value: ""
                            - name: RAMP_TIME
                              value: ""
                            - name: TARGET_PODS
                              value: ""
                            - name: CHAOS_INTERVAL
                              value: "10"
                            - name: SIGNAL
                              value: SIGKILL
                            - name: SOCKET_PATH
                              value: /run/containerd/containerd.sock
                            - name: CONTAINER_RUNTIME
                              value: containerd
                            - name: TOTAL_CHAOS_DURATION
                              value: "20"
                            - name: PODS_AFFECTED_PERC
                              value: ""
                            - name: NODE_LABEL
                              value: ""
                            - name: DEFAULT_HEALTH_CHECK
                              value: "false"
                            - name: LIB_IMAGE
                              value: xxx/litmuschaos/go-runner:latest
                            - name: SEQUENCE
                              value: parallel
      outputs: {}
      metadata:
        labels:
          weight: "10"
      container:
        name: ""
        image: xxx/litmuschaos/litmus-checker:2.11.0
        args:
          - -file=/tmp/chaosengine-container-kill-mci.yaml
          - -saveName=/tmp/engine-name
        resources: {}
  entrypoint: test-kill
  arguments:
    parameters:
      - name: adminModeNamespace
        value: application
  serviceAccountName: argo-chaos
  podGC:
    strategy: OnWorkflowCompletion
  securityContext:
    runAsUser: 1000
    runAsNonRoot: true
status: {}

Anything else we need to know?:

sebay commented 8 months ago

Seems it is a 3.4.0 UI wrong instruction which is misleading. When enabling chaos it asked to apply: https://raw.githubusercontent.com/litmuschaos/litmus/master/mkdocs/docs/3.0.0-beta10/litmus-portal-crds-3.0.0-beta10.yml This is old CRD. It should ask to apply something like: https://github.com/litmuschaos/litmus/blob/f86aad1328bcc580c97cc1ab57ee217880e7fb08/mkdocs/docs/3.4.0/litmus-portal-crds-3.4.0.yml

SarthakJain26 commented 8 months ago

@Saranya-jena @hrishavjha please take a look at this