Open vkatabat opened 1 year ago
/sig k8s-infra
@vkatabat Hi, I try to reproduce it, but it dosen't and seems to work well. I used this manifest, is it reproduced in your environment using it? If you see anything odd, please point it out:)
probe-pod.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: ready1
spec:
containers:
- name: readiness
image: busybox
command: ["top"]
readinessProbe:
exec:
command:
- /mnt/probe.sh
initialDelaySeconds: 3
failureThreshold: 1
successThreshold: 1
periodSeconds: 30
volumeMounts:
- name: probe-sh
mountPath: "/mnt"
readOnly: false
volumes:
- name: probe-sh
configMap:
name: probe-sh
defaultMode: 0777
---
apiVersion: v1
kind: ConfigMap
metadata:
name: probe-sh
data:
probe.sh: |
#!/bin/sh
echo Probe at $(date) > /tmp/probe.txt
exit 1
Apply this manifest.
$ kubectl apply -f probe-pod.yaml
pod/ready1 created
configmap/probe-sh created
After that, probe has failed.
$ kubectl describe po ready1
Name: ready1
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 77s default-scheduler Successfully assigned default/ready1 to kind-worker
Normal Pulling 76s kubelet Pulling image "busybox"
Normal Pulled 75s kubelet Successfully pulled image "busybox" in 1.401711187s (1.401724144s including waiting)
Normal Created 75s kubelet Created container readiness
Normal Started 75s kubelet Started container readiness
Warning Unhealthy 16s (x3 over 72s) kubelet Readiness probe failed:
Check the file and only one record has been added.
$ kubectl exec -it ready1 -- /bin/sh
/ # tail /tmp/probe.txt
Probe at Mon Jun 26 08:40:01 UTC 2023
My environment is k8s v1.27.3.
$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.27.3
Kustomize Version: v5.0.1
Server Version: v1.27.3
Thanks.
Hi @mochizuki875 I think you used ">" instead of append ">>" in /mnt/probe.sh. You can see in below logs it got scheduled twice in 5 min 12:48:21 & 12:48:26.
Log in my environment:
Below it got executed twice within 1 sec.
@vkatabat Oops! Sorry, it's a mistake and thank you for your comment. I was able to reproduce it as you described and will do some research.
It seems to be intentional from reading the code.
ReadinessProbe
will be executed by two triggers
periodSeconds
which is ReadinessProbe
's field)
probeTicker.C
channel will receive value when periodSeconds
has passed by time.Ticker
.Pod
sync occur
w.manualTriggerCh
channel will receive value when UpdatePod()
is called with Pod
sync and ReadinessProbe has not be success
. (as @SataQiu said)rf: https://github.com/kubernetes/kubernetes/issues/96731 https://github.com/kubernetes/kubernetes/pull/98376
/remove-sig k8s-infra /sig node /kind bug
Thank you @mochizuki875 for your explanation.
@vkatabat Yeah:) I've created PR related to it (#119089).
Hi @mochizuki875, Can the probe be exposed with an environment variable or have an api that can be called from probe handler to know which of the case periodSeconds or manualTrigger that caused the probe to be invoked?
This way pods can implement logic which they might want to execute only when invoked via timer periodSeconds?
@vkatabat Thank you for your comment.
As my investigation, these environment variable and API don't seem to exist.
Probe is triggered by periodSeconds and manualTrigger only during Pod
is not ready, and triggered only by periodSeconds if Pod
become ready.
According to #96731, I understand that triggering probe by manualTrigger is to make it efficient making Pod
ready.
Dose this pose any problems for you?
Hi @mochizuki875, ManualTrigger triggering probe early may cause thrashing issues when applications want to deterministically control allowing traffic to pods. We see some application want to use readiness probe failure to bring down the application. If there was momentary recovery on what stats they check to bring down the application. The manualTrigger can cause application to be declared up and causing more load on application.
It could be late to wait periodSeconds probe to fail again and shut the application.
If any API exist that probe can know the context when it was called then i think we can circumvent this thrashing problem?Basically as per k8s documentation i think users are of the opinion that readiness probes are triggerred periodically based on periodSeconds.
@vkatabat
Could you resolve this problem using startupProbe
and livenessProbe
?
readinessProbe
checks whether the container is ready to respond to requests and control whether adding or removing the Pod
IP address from the endpoints of Services.
So In my opinion, it's not for control shutting the application and its interval.
cf) https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#types-of-probe
FYI:
Basically as per k8s documentation i think users are of the opinion that readiness probes are triggerred periodically based on periodSeconds.
I was confused too before investigating this issue:-(
https://github.com/kubernetes/kubernetes/pull/118409 ^ This looks related.
Hi @mochizuki875 I think livelinessProbe was evaluated but as it restarts the container it may not suffice. As readinessProbe removes pod endpoint from service it is felt good to deterministically control the opening up the pod for traffic if resource usage is within the limits or shut if resource usage is above threshold.
@vkatabat
Our understanding what readinessProbe
is seems to be common.
Sorry, since I may not understand what you are saying sufficiently , so could you please give me more concrete use case which you want to do using readinessProbe
?
Hi @mochizuki875 Our usecase is to stop the pod from exhaust of resources. So we have have put some health check in the script to check the resource usage. If the CPU or Memory is increased to the threshold value set. The script will return a return status 1 else 0.
So we are using the readiness probe for this check and whenever readiness probe fails, the pod will be removed from the service IPs. This prevents the service from putting load to the pod. Once readiness probe succeeds i.e CPU and Memory resources are below the threshold value pod IP will be added back to service.
/triage accepted /priority backlog
@vkatabat Thank you for your explanation. However I don't have any good idea for your usecase so I've asked for advice in slack channel. https://kubernetes.slack.com/archives/C0BP8PW9G/p1689906432111149
I'm here because I noticed that if I have an initial delay of 0 that the readinessProbe
is checked twice on startup in rapid succession -- not waiting the period seconds. I think this is caused by the same behaviour (the manual trigger?) that this issue is about. I actually like that this is how it works (it works out well in my case to get the Pod ready quickly) but found it wasn't documented anywhere I could find.
I went digging because I have mentioned it in this writing, https://adoptingerlang.org/docs/production/kubernetes/#readiness-liveness-and-startup-probes
Despite failing and periodSeconds being 10 it likely won’t take 10 seconds for the Pod to be ready. With an initialDelaySeconds of 0 the readinessProbe is checked immediately during startup and does not wait the configured periodSeconds to check a second time. Because the service does little on startup it will pass the second check. If it doesn’t it will then wait the periodSeconds until it is checked again and the Pod becomes ready.
But without it being officially documented I was worried it could change and I should add a note that this is undefined behaviour.
It sounds like from this issue that it is not necessarily considered a bug, so may not change, but also not defined behaviour?
Also just curious if someone could explain what the flow is, I couldn't understand how the code results in this happening aside from the fact there is this separate trigger from the period.
I agree that having that hidden manual trigger makes this unpredictable and difficult to reason about. Even if we add a log for that as in https://github.com/kubernetes/kubernetes/pull/119089/files, one will need to increase the log level to see it...
And as was noted, this makes the documentation obsolete and periodSeconds
and especially failureThreshold
no longer make sense.
Maybe the fix that was merged as part of https://github.com/kubernetes/kubernetes/pull/98376 in regarding startup probes is sufficient for the use case that motivated the addition of this manual trigger for readiness probes.
Or maybe, as the manual trigger is meant to bring the Pod to a ready state quickly, maybe the manual trigger probe failures should not count against failureThreshold
, so the params remain meaningful. (this is not the best solution)
manual trigger meant to make Pod ready faster. We may need to adjust periodics timer after the manual trigger. The logic there definitely can use some polishing
What happened?
Below pod yaml file was deployed. From probe logs we see probe is getting scheduled more than once in 30 sec duration of perodSeconds configured for readinessProbe. At times it gets scheduled twice within 5 seconds though 30 sec is configured as periodSeconds
apiVersion: v1 kind: Pod metadata: labels: test: readiness name: ready1 spec: containers:
/ # cat /tmp/probe.sh
!/bin/sh
echo "Probe at
date
" >> /tmp/probe.log exit 1 / # tail -f /tmp/probe.log Probe at Fri Jun 16 07:16:16 UTC 2023 Probe at Fri Jun 16 07:16:35 UTC 2023 Probe at Fri Jun 16 07:17:05 UTC 2023 Probe at Fri Jun 16 07:17:19 UTC 2023 Probe at Fri Jun 16 07:17:35 UTC 2023What did you expect to happen?
Expect readinessProbe to be scheduled acurately as periodSeconds.
How can we reproduce it (as minimally and precisely as possible)?
Above yaml file can be created in k8s 1.27 with probe content to log output of
date
command.Anything else we need to know?
When probe was successful scheduling was accurate.
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)