Open Kanshiroron opened 5 years ago
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
/priority backlog
Yeah, would just like to add that this is really confusing :| I'm specifically interested in this:
What happen if the pods is ready before the end of the delay?
Considering both delays here: --horizontal-pod-autoscaler-cpu-initialization-period
and --horizontal-pod-autoscaler-initial-readiness-delay
.
Might be because I'm not a native English speaker but the paragraph seems contradictory as well?
For example:
Due to technical constraints, the HorizontalPodAutoscaler controller cannot exactly determine the first time a pod becomes ready when determining whether to set aside certain CPU metrics. Instead, it considers a Pod "not yet ready" if it's unready and transitioned to unready within a short, configurable window of time since it started. This value is configured with the --horizontal-pod-autoscaler-initial-readiness-delay flag, and its default is 30 seconds
Ok, I can kind of get that, although it's not very clear what happens in this scenario:
Technically, that's what it says in the documentation! It says it's not yet ready only if it's unready... so I should assume that if the pod is ready, even if briefly, it will be ready, which would cause all kinds of absurd scenarios, like above.
This does not make a lot of sense to me so I assume it waits until --horizontal-pod-autoscaler-initial-readiness-delay
finishes until HPA considers a pod ready, even if kubernetes considers it ready before that. But that should've been explicit in the documentation.
Ok, moving on.
Once a pod has become ready, it considers any transition to ready to be the first if it occurred within a longer, configurable time since it started. This value is configured with the --horizontal-pod-autoscaler-cpu-initialization-period flag, and its default is 5 minutes.
So, this says that any transition to ready will be the first if it occurs before --horizontal-pod-autoscaler-cpu-initialization-period
. First question: what does it mean to be the first? I couldn't find what is the importance of being the first transition to ready.
Second, what happens if the pod never transitions to ready before --horizontal-pod-autoscaler-cpu-initialization-period
? Say it takes 5 minutes and 1 second to become ready? To me this clearly states that to the HPA the pod never becomes ready. :thinking:
I've tried searching on google and on the kubernetes slack group and found no definitive answer to how these parameters work, although it seems many people believe --horizontal-pod-autoscaler-cpu-initialization-period
sets a wait time for new pods, and prevents them from being scaled until this time passes (although I'm myself am not convinced). I'll see if I can run some tests in my clusters to at least get some ideas.
Ok, summing up my comments in the form of questions:
--horizontal-pod-autoscaler-initial-readiness-delay
? --horizontal-pod-autoscaler-cpu-initialization-period
? Both? Does it also matter when the readinessProbe returns successfully?/sig autoscaling
I was trying to find the same information. The relevant code for this is here if it helps anyone: https://github.com/kubernetes/kubernetes/blob/30c9f097ca4a26dab9085832e006f09cb2993dda/pkg/controller/podautoscaler/replica_calculator.go#L392
We are also trying to figure out this issue. Would be happy to know if there are answers to the above questions.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
/reopen /lifecycle frozen /language en The doc needs some improvement with the help from SIG autoscaling. We have got quite some votes for improving the HPA docs.
@tengqm: Reopened this issue.
@Dafnafrank Do you have any update on this issue? We're facing similar issue with hpa and would like to see if there is any solution for that.
Waiting for the same. Kindly make documentation clear.
/retitle Unclear definition of the --horizontal-pod-autoscaler-initial-readiness-delay flag
It looks like the text from https://horizontal-pod-autoscaler.readthedocs.io/en/latest/user-guide/initial-readiness-delay/ could be useful to draw from.
/retitle Unclear definition of the --horizontal-pod-autoscaler-initial-readiness-delay flag
/triage accepted
this is still an issue. please rewrite the whole paragraph and explain in more detail. thanks
this is how I understand it.
the period after pod start during which readiness changes will be treated as initial readiness. This is in case the Pod goes in and out of the unready state. The code adds this delay to the pod's startTime and doesn't start looking at the readiness state until after pod startTime + initialReadinessDelay .
actually if you look at the code, wouldn't the initial-readiness-delay flag be used only if we are outside of the cpuInitializationPeriod ? This code and documentation is very confusing
I’m struggling a bit to parse: https://github.com/kubernetes/kubernetes/blob/30c9f097ca4a26dab9085832e006f09cb2993dda/pkg/controller/podautoscaler/replica_calculator.go#L392 mainly because I don’t think I fully get what After() is doing
This is what I understand (does it sound reasonably correct @kubernetes/sig-autoscaling-bugs @kubernetes/sig-autoscaling-misc @kubernetes/sig-scalability ?):
First check if the Pod has been acknowledged by the kubelet or has a Ready PodCondition. If yes, it’s added to the ready Pod count. If no:
Check if the value of startTime + cpuInitializationPeriod is in the future. If yes, that means the Pod is still initializing. In that case, wait until the initialization period is done and then ignore the Pod if it is still not Ready OR if the CPU metric wasn’t collected since the last time the Status changed.
If the value of startTime + cpuInitializationPeriod has already passed, ignore the Pod if it isnt currently in the Ready status and if it has not been Ready since the readiness delay period ended
For the questions in this comment:
Counts as Ready as long as the Pod is still ready after the end of the initial readiness delay.
Counts as ready. The CPU init period provides a window of time after the Pod Start Time in which the Pod has a chance to become ready
I don't think that there's an importance to the word first. I think when HPA loops and checks for the Ready state, if the Pod was Ready in the time period of CPU init period, that's all that matters
I'm not sure
/assign
He @shannonxtreme! can you share an update on this issue? Are you still willing to work on it?
@shannonxtreme I don't see any updates; so unassigning you. Please feel free to assign, if you come back here again and are willing to work on 🙂 /unassign @shannonxtreme
For 4 years there haven't been any updates on that paragraph. 😢 . I wish documentation provided some concrete examples.
/retitle Unclear definition of the --horizontal-pod-autoscaler-initial-readiness-delay
flag
Contributions are welcome.
it's been 4 years, I googled it a lot, no answer demystified this flag, until I saw this question which led me to here.
In order to understand this flag, it could help to read and understand the original PR: https://github.com/kubernetes/kubernetes/pull/68068
I have just read through the source code, and I'm going to post what I think it does here. (This is essentially just a rephrasing of posts above, but maybe if enough people describe it in their own words, one of those will make sense to whoever is reading this in the future.)
First, neither of these settings has any effect on non-CPU metrics. For non-CPU metrics, the behavior seems to be that all Running pods are included in the calculation, regardless of Readiness.
For CPU metrics, during the cpuInitializationPeriod
after pod start, a pod is included in metrics calculations if
After the cpuInitializationPeriod
, a pod is included in metrics calculations if
initialReadinessDelay
.So:
initialDelaySeconds
in the Readiness Probe).cpuInitializationPeriod
prevents old, Unready metrics samples from being used. So no matter how Readiness is configured, you still want cpuInitializationPeriod
to be long enough to cover that startup phase.initialReadinessDelay
has no effect whatsoever if your pod never switches from Ready back to Unready. So it seems like it's only meaningful for a pod that is liable to flip-flop inconsistently between Ready and Unready during startup. (Possibly because of the high CPU usage?)Roughly speaking, the behavior during cpuInitializationPeriod
is actually what I'd expected the behavior to always be: Only Ready pods matter in scaling. But, that would be bad: If high CPU usage causes your pod to become Unready, that would not be good because it would never scale up (because all pods with high usage would be Unready and thus not included in the calculation.)
This is the source code in question:
// Pod still within possible initialisation period.
if pod.Status.StartTime.Add(cpuInitializationPeriod).After(time.Now()) {
// Ignore sample if pod is unready or one window of metric wasn't collected since last state transition.
unready = condition.Status == v1.ConditionFalse || metric.Timestamp.Before(condition.LastTransitionTime.Time.Add(metric.Window))
} else {
// Ignore metric if pod is unready and it has never been ready.
unready = condition.Status == v1.ConditionFalse && pod.Status.StartTime.Add(delayOfInitialReadinessStatus).After(condition.LastTransitionTime.Time)
}
I have just read through the source code, and I'm going to post what I think it does here. (This is essentially just a rephrasing of posts above, but maybe if enough people describe it in their own words, one of those will make sense to whoever is reading this in the future.)
First, neither of these settings has any effect on non-CPU metrics. For non-CPU metrics, the behavior seems to be that all Running pods are included in the calculation, regardless of Readiness.
For CPU metrics, during the
cpuInitializationPeriod
after pod start, a pod is included in metrics calculations if
- That pod is currently Ready
- AND, its most recent metrics sample only covers the time during which it was Ready. (You don't use an old sample from back when it was still Unready.)
After the
cpuInitializationPeriod
, a pod is included in metrics calculations if
- That pod is currently Ready
- OR, it was ever Ready in the past at some time after the
initialReadinessDelay
.So:
- No matter what these settings are, if you have a pod that is reporting Ready, and it has a metric sample from the time that it is Ready, that pod will be included in the scaling calculation. No matter how early into its startup it becomes Ready. These settings cannot be used to require it to have been Ready for a certain amount of time before being used. Instead, you have to configure the pod to not be Ready until the startup high-CPU phase is over (for example by
initialDelaySeconds
in the Readiness Probe).- Only
cpuInitializationPeriod
prevents old, Unready metrics samples from being used. So no matter how Readiness is configured, you still wantcpuInitializationPeriod
to be long enough to cover that startup phase.initialReadinessDelay
has no effect whatsoever if your pod never switches from Ready back to Unready. So it seems like it's only meaningful for a pod that is liable to flip-flop inconsistently between Ready and Unready during startup. (Possibly because of the high CPU usage?)Roughly speaking, the behavior during
cpuInitializationPeriod
is actually what I'd expected the behavior to always be: Only Ready pods matter in scaling. But, that would be bad: If high CPU usage causes your pod to become Unready, that would not be good because it would never scale up (because all pods with high usage would be Unready and thus not included in the calculation.)This is the source code in question:
// Pod still within possible initialisation period. if pod.Status.StartTime.Add(cpuInitializationPeriod).After(time.Now()) { // Ignore sample if pod is unready or one window of metric wasn't collected since last state transition. unready = condition.Status == v1.ConditionFalse || metric.Timestamp.Before(condition.LastTransitionTime.Time.Add(metric.Window)) } else { // Ignore metric if pod is unready and it has never been ready. unready = condition.Status == v1.ConditionFalse && pod.Status.StartTime.Add(delayOfInitialReadinessStatus).After(condition.LastTransitionTime.Time) }
I appreciate the discussion here and after reading through the source code myself, I'd like to offer my take on it. Much like you've done, I believe that rephrasing complex concepts in our own terms can be helpful for others trying to grasp this in the future. Here's how I understand --horizontal-pod-autoscaler-cpu-initialization-period
and --horizontal-pod-autoscaler-initial-readiness-delay
:
thinks based on source code reading:
--horizontal-pod-autoscaler-cpu-initialization-period
(cpuInitializationPeriod
):
Throughout this interval, CPU is only collected under stringent conditions where the pod is ready and the most recent measurement is complete.
To put it another way, during this period, the CPU variability of pods that are not in a ready state will not affect HPA's scaling.
--horizontal-pod-autoscaler-initial-readiness-delay
:
Following the end of the cpuInitializationPeriod
, this delay period permits CPU metrics collection under the more relaxed conditions of "the pod was previously in a ready state."
To put it another way,, during this period, any CPU variability prior to the pod's initial readiness will not affect HPA scaling.
Hello, In the Horizontal Pod Autoscaler documentation, the --horizontal-pod-autoscaler-initial-readiness-delay has an unclear definition and make comprehension very difficult:
https://github.com/kubernetes/website/blob/master/content/en/docs/tasks/run-application/horizontal-pod-autoscale.md
Thank you for clarifying