litmuschaos / litmus-go

Apache License 2.0
69 stars 122 forks source link

Pod http latency experiment does not have any effect - no latency observed in target pod. #719

Open michal-poliwka opened 5 days ago

michal-poliwka commented 5 days ago

What happened: I'm running pod network http latency on port 443. Here is the Chaosexperiment yaml:

kubectl get chaosengine pod-http-latencyb8l59 -o yaml -n cluster-operators
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  annotations:
    probeRef: '[{"name":"null","mode":"SOT"}]'
  creationTimestamp: "2024-10-31T12:45:51Z"
  finalizers:
  - chaosengine.litmuschaos.io/finalizer
  generateName: pod-http-latency
  generation: 20
  labels:
    sidecar.istio.io/inject: "false"
    workflow_name: http-getter-pod-http-latency-30a983
    workflow_run_id: f058a607-f572-47a3-9812-c62f5974651a
  name: pod-http-latencyb8l59
  namespace: cluster-operators
  resourceVersion: "3125390221"
  uid: 6b233531-c385-4aee-83a0-bb2acdb8bf1c
spec:
  appinfo:
    appkind: deployment
    applabel: app.kubernetes.io/name=http-getter
    appns: cluster-operators
  chaosServiceAccount: litmus-admin
  components:
    runner:
      nodeSelector:
        kubernetes.io/os: linux
      resources:
        limits:
          cpu: 25m
          memory: 256Mi
        requests:
          cpu: 10m
          memory: 64Mi
      runnerLabels:
        sidecar.istio.io/inject: "false"
  engineState: active
  experiments:
  - name: pod-http-latency
    spec:
      components:
        env:
        - name: TARGET_CONTAINER
          value: http-getter
        - name: TOXICITY
          value: "100"
        - name: LIB_IMAGE
          value: litmuschaos/go-runner:3.8.0
        - name: NETWORK_INTERFACE
          value: eth0
        - name: TC_IMAGE
          value: gaiadocker/iproute2
        - name: LATENCY
          value: "2000"
        - name: TOTAL_CHAOS_DURATION
          value: "1800"
        - name: RAMP_TIME
        - name: PODS_AFFECTED_PERC
          value: "100"
        - name: DEFAULT_HEALTH_CHECK
          value: "false"
        - name: TARGET_PODS
        - name: NODE_LABEL
        - name: CONTAINER_RUNTIME
          value: containerd
        - name: TARGET_SERVICE_PORT
          value: "443"
        - name: DESTINATION_HOSTS
        - name: SOCKET_PATH
          value: /run/containerd/containerd.sock
        - name: SEQUENCE
          value: parallel
        - name: PROXY_PORT
          value: "20000"
        nodeSelector:
          kubernetes.io/os: linux
        resources:
          limits:
            cpu: 25m
            memory: 256Mi
          requests:
            cpu: 10m
            memory: 64Mi
        statusCheckTimeouts: {}
status:
  engineStatus: initialized
  experiments:
  - experimentPod: pod-http-latency-8wl934-gm4dw
    lastUpdateTime: "2024-10-31T12:47:52Z"
    name: pod-http-latency
    runner: pod-http-latencyb8l59-runner
    status: Running
    verdict: Awaited

Looking at helper pods - they targeted correct containers, and ran all commands successfully:

kubectl -n cluster-operators logs pod-http-latency-helper-nds9b
time="2024-10-31T12:46:16Z" level=info msg="Helper Name: http-chaos"
time="2024-10-31T12:46:16Z" level=info msg="[PreReq]: Getting the ENV variables"
time="2024-10-31T12:46:16Z" level=info msg="Container ID: b76c0a1002c282322d26ef749e8b4dc32d425c93a42ce6012923db76eaa3d9f0"
time="2024-10-31T12:46:18Z" level=info msg="[Info]: Container ID=b76c0a1002c282322d26ef749e8b4dc32d425c93a42ce6012923db76eaa3d9f0 has process PID=1125537"
time="2024-10-31T12:46:18Z" level=info msg="[Chaos]: Starting proxy server"
time="2024-10-31T12:46:21Z" level=info msg="[Info]: Proxy started successfully"
time="2024-10-31T12:46:21Z" level=info msg="[Chaos]: Adding IPtables ruleset"
time="2024-10-31T12:46:22Z" level=info msg="[Info]: IP rule set added successfully"
time="2024-10-31T12:46:22Z" level=info msg="successfully injected chaos on target: {name: http-getter-7c95c7cd56-8cn45, namespace: cluster-operators, container: http-getter}"
time="2024-10-31T12:46:35Z" level=info msg="[Chaos]: Waiting for 1800s"

kubectl -n cluster-operators logs pod-http-latency-helper-q288g
time="2024-10-31T12:46:15Z" level=info msg="Helper Name: http-chaos"
time="2024-10-31T12:46:16Z" level=info msg="[PreReq]: Getting the ENV variables"
time="2024-10-31T12:46:16Z" level=info msg="Container ID: 334b2e45c8d9bc2fe81126f5045f749bda09f92f185af78fa7b339f2809297ea"
time="2024-10-31T12:46:18Z" level=info msg="[Info]: Container ID=334b2e45c8d9bc2fe81126f5045f749bda09f92f185af78fa7b339f2809297ea has process PID=3329061"
time="2024-10-31T12:46:18Z" level=info msg="[Chaos]: Starting proxy server"
time="2024-10-31T12:46:21Z" level=info msg="[Info]: Proxy started successfully"
time="2024-10-31T12:46:21Z" level=info msg="[Chaos]: Adding IPtables ruleset"
time="2024-10-31T12:46:22Z" level=info msg="[Info]: IP rule set added successfully"
time="2024-10-31T12:46:22Z" level=info msg="successfully injected chaos on target: {name: http-getter-7c95c7cd56-wbnrc, namespace: cluster-operators, container: http-getter}"
time="2024-10-31T12:46:35Z" level=info msg="[Chaos]: Waiting for 1800s"

My pod is just doing http requests, here is the code of it:

package main

import (
    "fmt"
    "net/http"
    "time"
)

func main() {
    for true {
        start := time.Now()
        response, err := http.Get("https://example.com/")
        duration := time.Since(start)

        if err != nil {
            fmt.Printf("%v %v\n\n", time.Now().Format(time.RFC1123), err.Error())
        } else {
            fmt.Printf("%v example.com status code: %v. Duration %v\n\n", time.Now().Format(time.RFC1123), response.StatusCode, duration)
        }

        start = time.Now()
        response, err = http.Get("http://endless.horse/")
        duration = time.Since(start)

        if err != nil {
            fmt.Printf("%v %v\n\n", time.Now().Format(time.RFC1123), err.Error())
        } else {
            fmt.Printf("%v endless.horse status code: %v. Duration %v\n\n", time.Now().Format(time.RFC1123), response.StatusCode, duration)
        }
        time.Sleep(10 * time.Second)
    }
}

I would expect that I will observe longer response time (above 2 seconds) - but nothing like that happens:

k -n cluster-operators logs http-getter-7c95c7cd56-8cn45 --since=45s
Thu, 31 Oct 2024 12:52:47 UTC example.com status code: 200. Duration 13.527294ms

Thu, 31 Oct 2024 12:52:47 UTC endless.horse status code: 200. Duration 335.341592ms

Thu, 31 Oct 2024 12:52:57 UTC example.com status code: 200. Duration 14.206385ms

Thu, 31 Oct 2024 12:52:57 UTC endless.horse status code: 200. Duration 101.305447ms

Thu, 31 Oct 2024 12:53:07 UTC example.com status code: 200. Duration 13.992599ms

Thu, 31 Oct 2024 12:53:07 UTC endless.horse status code: 200. Duration 164.544507ms

Thu, 31 Oct 2024 12:53:17 UTC example.com status code: 200. Duration 14.968083ms

Thu, 31 Oct 2024 12:53:17 UTC endless.horse status code: 200. Duration 181.037902ms

Invoking curl from inside the pod to https://example.com:443 also returns the response < 50ms

What you expected to happen: Invoking curl from inside the pod to https://example.com:443 also returns the response after more than 2000ms

Where can this issue be corrected? (optional)

How to reproduce it (as minimally and precisely as possible): I think I described reproduction scenario in the first section

Anything else we need to know?: