Open NorbertGajda8 opened 7 months ago
DPP's field inbound[].health.ready
reflects the state of the pod's status.containerStatuses[].ready
. If the readiness probe is failing it doesn't necessarily mean status.containerStatuses[].ready
is going to be permanently false
:
failureThreshold: After a probe fails failureThreshold times in a row, Kubernetes considers that the overall check has failed: the container is not ready/healthy/live. For the case of a startup or liveness probe, if at least failureThreshold probes have failed, Kubernetes treats the container as unhealthy and triggers a restart for that specific container. The kubelet honors the setting of terminationGracePeriodSeconds for that container. For a failed readiness probe, the kubelet continues running the container that failed checks, and also continues to run more probes; because the check failed, the kubelet sets the Ready condition on the Pod to false.
From the line you've sent:
Normal UpdatedKumaDataplane 10m (x5 over 11m) k8s.kuma.io/dataplane-generator Updated Kuma Dataplane: xxxxxxx-v1-7d6d8f87c8-vq8cd
I'd assume that DPP's inbound[].health.ready
was changed 5 times over the last 11m and probably was false
for some period of time.
Could you please check the DPP resource to confirm if this is the case?
What we did is we deployed our pod with a readiness probe that will always fails as an experiment. So the status.containerStatuses[].ready
was going to be permanently false
.
Here is the redacted pod description, that was recorded at the same time as the DPP state above:
Name: manufacturing-v1-7d6d8f87c8-vq8cd
Namespace: backend
Priority: 0
Service Account: manufacturing-sa
Node: ****
Start Time: Wed, 14 Feb 2024 16:20:58 +0000
Labels: app.kubernetes.io/instance=
app.kubernetes.io/name=manufacturing
app.kubernetes.io/version=v1
azure.workload.identity/use=true
pod-template-hash=7d6d8f87c8
Annotations: fluentbit.io/exclude: true
kuma.io/builtin-dns: enabled
kuma.io/builtin-dns-port: 15053
kuma.io/envoy-admin-port: 9901
kuma.io/mesh: default
kuma.io/sidecar-injected: true
kuma.io/sidecar-uid: 5678
kuma.io/transparent-proxying: enabled
kuma.io/transparent-proxying-ebpf: disabled
kuma.io/transparent-proxying-inbound-port: 15006
kuma.io/transparent-proxying-inbound-v6-port: 15010
kuma.io/transparent-proxying-outbound-port: 15001
kuma.io/virtual-probes: enabled
kuma.io/virtual-probes-port: 9000
Status: Running
IP: 10.25.12.143
IPs:
IP: 10.25.12.143
Controlled By: ReplicaSet/manufacturing-v1-7d6d8f87c8
Init Containers:
kuma-init:
Container ID: containerd://7d83d3cb4ac9fd5030a8d6546db2ec4e131ecf188df6bbe0e1349db1e441e53e
Image: docker.io/kumahq/kuma-init:2.3.3
Image ID: docker.io/kumahq/kuma-init@sha256:8a866382b5bc55e8a44630721c9e78822064781638c8bc84604d62e7f267d9c7
Port: <none>
Host Port: <none>
Command:
/usr/bin/kumactl
install
transparent-proxy
Args:
--redirect-outbound-port
15001
--redirect-inbound=true
--redirect-inbound-port
15006
--redirect-inbound-port-v6
15010
--kuma-dp-uid
5678
--exclude-inbound-ports
--exclude-outbound-ports
--verbose
--skip-resolv-conf
--redirect-all-dns-traffic
--redirect-dns-port
15053
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 14 Feb 2024 16:20:59 +0000
Finished: Wed, 14 Feb 2024 16:21:00 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50M
Requests:
cpu: 20m
memory: 20M
Environment:
****
Mounts:
****
Containers:
kuma-sidecar:
Container ID: containerd://7d944df47e0c5c8cf097f808c7bfe8573ccf994023a6e3ce7b46cf813d9c4f94
Image: docker.io/kumahq/kuma-dp:2.3.3
Image ID: docker.io/kumahq/kuma-dp@sha256:5740645f2bf9db8a0050e0069ee7734e4ae4dfa8401259d755065542555f798e
Port: <none>
Host Port: <none>
Args:
run
--log-level=info
--concurrency=2
State: Running
Started: Wed, 14 Feb 2024 16:21:00 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 50m
memory: 64Mi
Liveness: http-get http://:9901/ready delay=60s timeout=3s period=5s #success=1 #failure=12
Readiness: http-get http://:9901/ready delay=1s timeout=3s period=5s #success=1 #failure=12
Environment:
****
INSTANCE_IP: (v1:status.podIP)
KUMA_CONTROL_PLANE_CA_CERT: ****
KUMA_CONTROL_PLANE_URL: https://kuma-control-plane.kuma:5678
KUMA_DATAPLANE_DRAIN_TIME: 30s
KUMA_DATAPLANE_MESH: default
KUMA_DATAPLANE_NAME: $(POD_NAME).$(POD_NAMESPACE)
KUMA_DATAPLANE_RUNTIME_TOKEN_PATH: /var/run/secrets/kubernetes.io/serviceaccount/token
KUMA_DNS_CORE_DNS_BINARY_PATH: coredns
KUMA_DNS_CORE_DNS_EMPTY_PORT: 15054
KUMA_DNS_CORE_DNS_PORT: 15053
KUMA_DNS_ENABLED: true
KUMA_DNS_ENVOY_DNS_PORT: 15055
****
Mounts:
****
opa:
Container ID: containerd://57070eafea7af684400cfffd73a65c541f9ea40fdb972b54b63170c45752b82e
Image: openpolicyagent/opa:0.44.0-envoy-2
Image ID: docker.io/openpolicyagent/opa@sha256:9e86f6f44bdf0ba51695f2092745d2f3ec361dbec9c5655d209fea4ad69f2df8
Port: <none>
Host Port: <none>
Args:
run
--server
--addr=http://localhost:8181
--diagnostic-addr=0.0.0.0:8282
--ignore=.*
--config-file=****
State: Running
Started: Wed, 14 Feb 2024 16:21:00 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 30m
memory: 128Mi
Liveness: http-get http://:9000/8282/health%3Fplugins delay=5s timeout=1s period=15s #success=1 #failure=3
Readiness: http-get http://:9000/8282/health%3Fplugins delay=5s timeout=1s period=15s #success=1 #failure=3
Environment:
****
Mounts:
****
service:
Container ID: containerd://802c3b67548ccc5985594b6c925d812d1c093603a07332a6bfc978c7c031103c
Image: ****
Image ID: ****
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 14 Feb 2024 16:21:01 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 300m
memory: 400Mi
Requests:
cpu: 30m
memory: 128Mi
Liveness: tcp-socket :8080 delay=30s timeout=5s period=20s #success=1 #failure=3
Readiness: http-get http://:9000/8080/ready delay=15s timeout=1s period=10s #success=3 #failure=2
Startup: http-get http://:9000/8080/health/startup delay=0s timeout=5s period=15s #success=1 #failure=5
Environment:
****
Mounts:
****
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
****
kuma-sidecar-tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned backend/manufacturing-v1-7d6d8f87c8-vq8cd to aks-system-23022122-vmss0000fe
Normal Pulled 11m kubelet Container image "docker.io/kumahq/kuma-init:2.3.3" already present on machine
Normal Created 11m kubelet Created container kuma-init
Normal Started 11m kubelet Started container kuma-init
Normal CreatedKumaDataplane 11m k8s.kuma.io/dataplane-generator Created Kuma Dataplane: manufacturing-v1-7d6d8f87c8-vq8cd
Normal Started 11m kubelet Started container kuma-sidecar
Normal Created 11m kubelet Created container kuma-sidecar
Normal Pulled 11m kubelet Container image "docker.io/kumahq/kuma-dp:2.3.3" already present on machine
Normal Pulled 11m kubelet Container image "openpolicyagent/opa:0.44.0-envoy-2" already present on machine
Normal Created 11m kubelet Created container opa
Normal Started 11m kubelet Started container opa
Normal Pulling 11m kubelet Pulling image "****"
Normal Pulled 11m kubelet Successfully pulled image "****" in 342.113176ms (342.118376ms including waiting)
Normal Created 11m kubelet Created container service
Normal Started 11m kubelet Started container service
Warning Unhealthy 11m (x2 over 11m) kubelet Readiness probe failed: Get "http://10.25.12.143:9901/ready": dial tcp 10.25.12.143:9901: connect: connection refused
Normal UpdatedKumaDataplane 10m (x5 over 11m) k8s.kuma.io/dataplane-generator Updated Kuma Dataplane: manufacturing-v1-7d6d8f87c8-vq8cd
Warning Unhealthy 62s (x67 over 10m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
So the status.containerStatuses[].ready was going to be permanently false.
IIUC how probes work it won't be permanently false even if the readiness probe always fails on purpose. kubelet
gives it a chance once in a while to see if the issue resolved itself.
kubelet
gives it a chance once in a while to see if the issue resolved itself
But does this mean that traffic is expected to be routed to a 2/3 failing pod, while an another pod still lives and healthy. I would expect all traffic to be going to the healthy working pod, rather than the one reporting not ready.
To reiterate:
Are these expected behaviours?
But does this mean that traffic is expected to be routed to a 2/3 failing pod
No, when you see 2/3
failing pod it means status.containerStatuses[].ready: false
and at this moment Kuma should not route traffic to it. But if I understand correctly 2/3
switches back to 3/3
from time to time, isn't it?
BTW does you Pod has containerPort
specified? Without containerPort
Kuma doesn't know which container is app container and maybe doesn't take status into account in this case.
But if I understand correctly
2/3
switches back to3/3
from time to time, isn't it?
No, it does not switch back to 3/3
, or at least I don't percieve it from looking at the watch -n 0.1 "kubectl get pods -n backend"
output. Is there a log that would show me if it ever switched to 3/3
?
containerPort
is not set, we will try that, thanks. I will report back on Monday probably.
We tried adding the containerPort
while emitting false
in Readiness probe and things started working as expected. No longer did we get traffic on the unhealthy pod, so that's good. We will add this to our other services too.
Although, we did not find all this in the documentation. Can you please give a link to this topic or where it describes the containerPort
?
Thanks for the help.
Triage: we want to improve docs with this info
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
What happened?
Summary
Kuma-sidecar is properly running, but our app container is failing and Kuma routes traffic to this pod, even if this container health probe (Readiness probe) is failing.
Tested on Kuma v2.3.3 and 2.5.2, same behavior.
Container configuration health probes
Pod events:
Warning Unhealthy 11m (x2 over 11m) kubelet Readiness probe failed: Get "http://10.25.12.143:9901/ready": dial tcp 10.25.12.143:9901: connect: connection refused Normal UpdatedKumaDataplane 10m (x5 over 11m) k8s.kuma.io/dataplane-generator Updated Kuma Dataplane: xxxxxxx-v1-7d6d8f87c8-vq8cd Warning Unhealthy 62s (x67 over 10m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Deployed pod's container configs:
Liveness: tcp-socket :8080 delay=30s timeout=5s period=20s #success=1 #failure=3 Readiness: http-get http://:9000/8080/ready delay=15s timeout=1s period=10s #success=3 #failure=2 Startup: http-get http://:9000/8080/health/startup delay=0s timeout=5s period=15s #success=1 #failure=5
Kuma applied config (health.ready is true, and no events triggered)
If you need any more configuration parameter, I will gladly provide it.