litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.39k stars 688 forks source link

k8s Probe not starting up #3941

Open KojoRising opened 1 year ago

KojoRising commented 1 year ago

What happened: I'm just trying to run a vanilla k8sProbe. I've done the simplest tests possible (checking for the presence/absence of a pod etc.), as well as just tried the "create" k8sProbe from the example docs. Please see error logs & yaml logs below:

CHAOSENGINE TEMPLATE

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  namespace: {{inputs.parameters.chaos_namespace}}
  generateName: pod-cpu-engine-
  labels:
    executionId: {{inputs.parameters.execution_id}}
spec:
  appinfo:
    appns: {{inputs.parameters.app_namespace}}
    applabel: {{inputs.parameters.target_label}}
    appkind: {{inputs.parameters.app_kind}}
  engineState: active
  chaosServiceAccount: {{inputs.parameters.chaos_service_account}}
  components:
    runner:
      nodeSelector:
        agentpool: compute
  experiments:
    - name: pod-cpu-hog
      spec:
        probe:
        - name: "small-interactive-ads-1"
          type: "k8sProbe"
          k8sProbe/inputs:
            group: ""
            version: "v1"
            resource: "small-interactive-ads-1"
            namespace: "scenario-ns"
            fieldSelector: ""
            operation: "present"
            labelSelector: ""
          mode: "OnChaos"
          runProperties:
            probeTimeout: 5
            interval: 5
            retry: 10
        components:
          env:
            - name: CPU_CORES
              value: "{{inputs.parameters.cpu_cores}}"
            - name: TOTAL_CHAOS_DURATION
              value: "{{inputs.parameters.duration_seconds}}"
            - name: CPU_LOAD
              value: "{{inputs.parameters.cpu_load}}"
            - name: CONTAINER_RUNTIME
              value: "{{inputs.parameters.container_runtime}}"
            - name: SOCKET_PATH
              value: "{{inputs.parameters.socket_path}}"
            - name: PODS_AFFECTED_PERC
              value: "100"
        nodeSelector:
          agentpool: compute

ERROR LOGS

ime="2023-04-06T16:06:31Z" level=info msg="[Status]: The Container status are as follows" Readiness=true container=small-interactive-ads Pod=small-interactive-ads-1
time="2023-04-06T16:06:33Z" level=info msg="[Status]: Checking whether application pods are in running state"
time="2023-04-06T16:06:33Z" level=info msg="[Status]: The status of Pods are as follows" Status=Running Pod=small-interactive-ads-1
time="2023-04-06T16:06:37Z" level=info msg="[Info]: The chaos tunables are:" CPU Core=0 CPU Load Percentage=10 Sequence=parallel PodsAffectedPerc=100
time="2023-04-06T16:06:37Z" level=info msg="[Chaos]:Number of pods targeted: 1"
time="2023-04-06T16:06:37Z" level=info msg="Target pods list for chaos, [small-interactive-ads-1]"
time="2023-04-06T16:06:39Z" level=info msg="[Probe]: The k8s probe information is as follows" Mode=OnChaos Phase=DuringChaos Name=small-interactive-ads-1 Inputs="{ v1 small-interactive-ads-1  scenario-ns   present}" Run Properties="{5 5 10 0 0 0 false}"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: the server could not find the requested resource"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: {\"errorCode\":\"K8S_PROBE_ERROR\",\"reason\":\"unable to list the resources with matching selector, err: the server could not find the requested resource\",\"target\":\"{name: small-interactive-ads-1}\"}"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: the server could not find the requested resource"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: {\"errorCode\":\"K8S_PROBE_ERROR\",\"reason\":\"unable to list the resources with matching selector, err: the server could not find the requested resource\",\"target\":\"{name: small-interactive-ads-1}\"}"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: the server could not find the requested resource"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: {\"errorCode\":\"K8S_PROBE_ERROR\",\"reason\":\"unable to list the resources with matching selector, err: the server could not find the requested resource\",\"target\":\"{name: small-interactive-ads-1}\"}"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: the server could not find the requested resource"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: {\"errorCode\":\"K8S_PROBE_ERROR\",\"reason\":\"unable to list the resources with matching selector, err: the server could not find the requested resource\",\"target\":\"{name: small-interactive-ads-1}\"}"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: the server could not find the requested resource"
time="2023-04-06T16:06:39Z" level=error msg="the small-interactive-ads-1 k8s probe has Failed, err: {\"errorCode\":\"K8S_PROBE_ERROR\",\"reason\":\"unable to list the resources with matching selector, err: the server could not find the requested resource\",\"target\":\"{name: small-interactive-ads-1}\"}"

What you expected to happen:

I just want to get a k8sProbe working without failure. As from the logs, the chaosengine is able to correctly see the small-interactive-ads-1 pod from the logs below. But then, the k8sProbe within the chaosengine is unable to recognize the same pod.

time="2023-04-06T16:06:33Z" level=info msg="[Status]: The status of Pods are as follows" Status=Running Pod=small-interactive-ads-1
time="2023-04-06T16:06:37Z" level=info msg="[Info]: The chaos tunables are:" CPU Core=0 CPU Load Percentage=10 Sequence=parallel PodsAffectedPerc=100
time="2023-04-06T16:06:37Z" level=info msg="[Chaos]:Number of pods targeted: 1"
time="2023-04-06T16:06:37Z" level=info msg="Target pods list for chaos, [small-interactive-ads-1]"

Where can this issue be corrected? (optional)

How to reproduce it (as minimally and precisely as possible): Run above chaos engine (but entire litmus setup will change the actual scenario).

Anything else we need to know?:

oumkale commented 1 year ago

Hi @KojoRising, Thanks for trying out the probe.

Seems like labelSelector is empty.

Please take a look in sample example: https://github.com/litmuschaos/chaos-charts/blob/e4f5d12eb076608047dad8c3b09a51303f91feb5/workflows/sock-shop-promProbe/workflow.yaml#L375