litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.36k stars 686 forks source link

Chaos Injection Failed: Target Pods Not Found #3328

Closed cryslam closed 2 years ago

cryslam commented 2 years ago

Question

The chaos injection fails with error: target pods not found. How are the target pods referenced? I'm targeting SelfAgent using the generic/container-kill experiment in chaoshub, so I'm assumming the yaml configuration is valid. Additionally all base LitmusChaos pods are running with no errors in pod events/ logs. Would someone be able to provide any pointers?

LitmusChaos version: 2.2.0 k8s v1.20.6

ChaosResult:

metadata:
  name: container-killklwfm-container-kill
  namespace: litmus
  uid: 254845c1-6956-4c51-ab16-254ce0c89ed7
  resourceVersion: "167959435"
  generation: 2
  creationTimestamp: 2021-11-05T02:16:29Z
  labels:
    app.kubernetes.io/component: experiment-job
    app.kubernetes.io/part-of: litmus
    app.kubernetes.io/runtime-api-usage: "true"
    app.kubernetes.io/version: latest
    chaosUID: 2b0d8f19-f815-41a6-a9bd-8f1e4b32bdfb
    controller-uid: 694d4bf6-1785-4c6e-9042-b750237799f3
    job-name: container-kill-54tlee
    name: container-kill
  managedFields:
    - manager: experiments
      operation: Update
      apiVersion: litmuschaos.io/v1alpha1
      time: 2021-11-05T02:16:29Z
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            .: {}
            f:app.kubernetes.io/component: {}
            f:app.kubernetes.io/part-of: {}
            f:app.kubernetes.io/runtime-api-usage: {}
            f:app.kubernetes.io/version: {}
            f:chaosUID: {}
            f:controller-uid: {}
            f:job-name: {}
            f:name: {}
        f:spec:
          .: {}
          f:engine: {}
          f:experiment: {}
        f:status:
          .: {}
          f:experimentStatus: {}
          f:history: {}
spec:
  engine: container-killklwfm
  experiment: container-kill
status:
  experimentStatus:
    phase: Completed
    verdict: Fail
    failStep: failed in chaos injection phase
    probeSuccessPercentage: "0"
  history:
    passedRuns: 0
    failedRuns: 1
    stoppedRuns: 0

ChaosRunner Pod Logs:

kubectl logs container-killklwfm-runner -n litmus
W1105 02:16:24.254902       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2021-11-05T02:16:24Z" level=info msg="Experiments details are as follows" appKind=deployment Service Account Name=litmus-admin Engine Namespace=litmus Experiments List="[container-kill]" Engine Name=container-killklwfm appLabels="app=nginx" appNs=default
time="2021-11-05T02:16:24Z" level=info msg="Getting the ENV Variables"
time="2021-11-05T02:16:24Z" level=info msg="Preparing to run Chaos Experiment: container-kill"
time="2021-11-05T02:16:24Z" level=info msg="Started Chaos Experiment Name: container-kill, with Job Name: container-kill-54tlee"
time="2021-11-05T02:16:45Z" level=info msg="Chaos Pod Completed, Experiment Name: container-kill, with Job Name: container-kill-54tlee"
time="2021-11-05T02:16:47Z" level=info msg="Chaos Engine has been updated with result, Experiment Name: container-kill"
time="2021-11-05T02:16:47Z" level=info msg="[skip]: skipping the job deletion as jobCleanUpPolicy is set to {retain}"

Experiment Pod Logs:

kubectl logs container-kill-54tlee-v9xjp -n litmus
W1105 02:16:27.787445       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2021-11-05T02:16:27Z" level=info msg="Experiment Name: container-kill"
time="2021-11-05T02:16:27Z" level=info msg="[PreReq]: Getting the ENV for the container-kill experiment"
time="2021-11-05T02:16:27Z" level=info msg="[PreReq]: Updating the chaos result of container-kill experiment (SOT)"
time="2021-11-05T02:16:29Z" level=info msg="The application information is as follows" Namespace=default Label="app=nginx" Target Container= Chaos Duration=20 Container Runtime=docker
time="2021-11-05T02:16:29Z" level=info msg="[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)"
time="2021-11-05T02:16:29Z" level=info msg="[Status]: Checking whether application containers are in ready state"
time="2021-11-05T02:16:31Z" level=info msg="[Status]: Checking whether application pods are in running state"
time="2021-11-05T02:16:37Z" level=error msg="Chaos injection failed, err: No target pod found"
uditgaurav commented 2 years ago

Hi @lamc you can provide the target pod details while preparing the workflow in ChaosEngine spec called appinfo please refer to the experiment docs for example. By default it will look for pods in namespace: default, label: app=nginx, and appkind: deployment (also visible in logs) which you can tune with your own value.

cryslam commented 2 years ago

thank you @uditgaurav for your quick response and the link, as I couldn't find it referenced in docs.litmuschaos.io :)

ah okay understood. I didn't realize the templated workflows are deploying the apps such as sock shop, bank of anthos, etc in order to run the chaos workflows/ experiments and if we wanted to create our own custom workflow or use experiements from chaoshub, we will need to reference applications already deployed or add that to the workflow.

VeeraballiGopal commented 2 years ago

W0201 03:53:20.965634 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2022/02/01 03:53:21

ChaosEngine Name : container-killhmklq

2022/02/01 03:53:21 Created Resource Details: {container-killhmklq litmuschaos.io v1alpha1 ChaosEngine litmus } W0201 03:53:21.022260 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2022/02/01 03:53:21 Starting Chaos Checker in 1min 2022/02/01 03:54:21 Checking if Engine Completed or Stopped 2022/02/01 03:54:21 [*] ENGINE COMPLETED W0201 03:53:25.804673 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. time='2022-02-01T03:53:25Z' level=info msg='Experiment Name: container-kill' time='2022-02-01T03:53:25Z' level=info msg='[PreReq]: Getting the ENV for the container-kill experiment' time='2022-02-01T03:53:27Z' level=info msg='[PreReq]: Updating the chaos result of container-kill experiment (SOT)' time='2022-02-01T03:53:29Z' level=info msg='The application information is as follows' Container Runtime=crio Namespace=test Label='app=nginx' Target Container= Chaos Duration=20 time='2022-02-01T03:53:29Z' level=info msg='[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)' time='2022-02-01T03:53:29Z' level=info msg='[Status]: Checking whether application containers are in ready state' time='2022-02-01T03:53:29Z' level=info msg='[Status]: The Container status are as follows' Pod=nginx-6799fc88d8-6q6rr Readiness=true container=nginx time='2022-02-01T03:53:29Z' level=info msg='[Status]: The Container status are as follows' container=nginx Pod=nginx-deployment-66b6c48dd5-nwps2 Readiness=true time='2022-02-01T03:53:29Z' level=info msg='[Status]: The Container status are as follows' Readiness=true container=nginx Pod=nginx-deployment-66b6c48dd5-rrrh8 time='2022-02-01T03:53:31Z' level=info msg='[Status]: Checking whether application pods are in running state' time='2022-02-01T03:53:31Z' level=info msg='[Status]: The status of Pods are as follows' Pod=nginx-6799fc88d8-6q6rr Status=Running time='2022-02-01T03:53:31Z' level=info msg='[Status]: The status of Pods are as follows' Pod=nginx-deployment-66b6c48dd5-nwps2 Status=Running time='2022-02-01T03:53:31Z' level=info msg='[Status]: The status of Pods are as follows' Pod=nginx-deployment-66b6c48dd5-rrrh8 Status=Running time='2022-02-01T03:53:33Z' level=info msg='[Chaos]:Number of pods targeted: 1' time='2022-02-01T03:53:33Z' level=info msg='Target pods list for chaos, [nginx-6799fc88d8-6q6rr]' time='2022-02-01T03:53:33Z' level=info msg='[Info]: Details of application under chaos injection' ContainerName=nginx PodName=nginx-6799fc88d8-6q6rr NodeName=ip-10-0-1-100.ap-northeast-2.compute.internal time='2022-02-01T03:53:33Z' level=info msg='[Status]: Checking the status of the helper pods' time='2022-02-01T03:53:37Z' level=info msg='container-kill-helper-igsbfm helper pod is in Running state' time='2022-02-01T03:53:39Z' level=info msg='[Wait]: waiting till the completion of the helper pod' time='2022-02-01T03:53:39Z' level=info msg='helper pod status: Failed' time='2022-02-01T03:53:39Z' level=info msg='[Status]: The running status of Pods are as follows' Pod=container-kill-helper-igsbfm Status=Failed time='2022-02-01T03:53:44Z' level=error msg='Chaos injection failed, err: helper pod failed' W0201 03:53:22.267759 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. time='2022-02-01T03:53:22Z' level=info msg='Experiments details are as follows' Experiments List='[container-kill]' Engine Name=container-killhmklq appLabels='app=nginx' appNs=test appKind=deployment Service Account Name=litmus-admin Engine Namespace=litmus time='2022-02-01T03:53:22Z' level=info msg='Getting the ENV Variables' time='2022-02-01T03:53:22Z' level=info msg='Preparing to run Chaos Experiment: container-kill' time='2022-02-01T03:53:22Z' level=info msg='Started Chaos Experiment Name: container-kill, with Job Name: container-kill-9rz7kb' time='2022-02-01T03:53:54Z' level=info msg='Chaos Pod Completed, Experiment Name: container-kill, with Job Name: container-kill-9rz7kb' time='2022-02-01T03:53:57Z' level=info msg='Chaos Engine has been updated with result, Experiment Name: container-kill' time='2022-02-01T03:53:57Z' level=info msg='[skip]: skipping the job deletion as jobCleanUpPolicy is set to {}'

VeeraballiGopal commented 2 years ago

may know the solution for helper pod issue