Open GautamSinghania opened 1 year ago
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/sig autoscaling
An alternative we were thinking of is removing tolerance altogether. I'm curious is tolerance at all beneficial for you?
Removing tolerance might be useful to us (honestly, we will have to test that). But, I imagine having tolerance would be good as general.
i also need this Feature. Some services are very sensitive to delay. Using this Feature can reduce the number of scaling ,for example we need a feature to let cpu more than 60% to scale up,and low than 40%to scale down,40%-60% not scale down and up
i also need this Feature. Some services are very sensitive to delay. Using this Feature can reduce the number of scaling ,for example we need a feature to let cpu more than 60% to scale up,and low than 40%to scale down,40%-60% not scale down and up
+1
For anyone following this thread from before, I have updated the requirement to have separate custom tolerance levels for scale up and scale down. I believe this will be a minimal and impactful change.
@GautamSinghania could you explain the rationale behind having a different tolerance for scaling up and scaling down? Is your use case for differing tolerance levels solvable by tuning behavior controls?
@pbetkier I feel that different tolerances are a general ask and should be doable. In my instance, it derives from the fact that pods take a long time to come up. Hence, I want to have a higher tolerance for scale down and lower tolerance for scale up. This helps me control the average values and scale up/down tolerances separately.
Tuning behavior controls will be a round-about way to do this, but can be done.
I feel that if the ask is too great or convoluted from the back-end, we could reduce it to a simple custom tolerance control. However, if different tolerances are doable, it would be a great offering.
In our usecase we use external scalers and the granularity for control is not there without being able to set this tolerance value on a case-by-case basis;
simple example, a cron-based scaler:
100
replicas between 1-2PM
108
replicas between 2-3PM
The external metric returns either 100
or 108
depending on the time of day; But the scaleUp never happens because of the 10% tolerance default setting.
Here's an example:
currentReplicas = 100
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
...
metrics:
- external:
metric:
name: my-cron
selector:
matchLabels:
cron-scale: true
target:
averageValue: "1"
type: AverageValue
During first cron schedule, pods scale up to 100, ratio is 1/1 because 100 = 100 desiredReplicas = 100
status:
currentMetrics:
- external:
current:
averageValue: "1"
value: "0"
metric:
name: my-cron
selector:
matchLabels:
cron-scale: true
then during the second cron schedule: desiredReplicas still = 100 because 1080m/1 = 1.08 and is less then the 10% tolerance, so scaleUp is never triggered
status:
currentMetrics:
- external:
current:
averageValue: "1080m"
value: "0"
metric:
name: my-cron
selector:
matchLabels:
cron-scale: true
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Our use case ties the HPA of a stateful set to its PVs. We want to scale up when the aggregate consumption of the PVs is 70%, and we use a custom metric to track that. However, the tolerance value "adjusts" the scale-up to occur at 77% so we need to factor in the tolerance and set our criteria to 63% (knowing it won't scale up until the 10% tolerance has been exceeded). As a result, we'd love to disable tolerance on this HPA config so that the value we use as the criteria is true.
It seems like adding these tolerations (preferably for both scale-up and down) would be preferred over its removal for backward compatibility alone.
This would be useful for us as well. We have clusters which have workloads ranging in size from 10 pods to 100s of pods, so tolerance would need to be different per workload.
This will be super helpful to speed up our scale ups and also not affect the scale down rates. It should be part of having different policies for scale up and scale down. The ability to not have different tolerance values for each affects the effectiveness of these scaling policies
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
We have the same problem and the tolerance seems to be affecting scale up even when containers are getting OOMKilled. Are there any plans to address this?
We are also having the same problem here!
We have a deployment which takes infinitely-long tasks (connecting to a camera and streaming from it until stopped) from a pool and processes them. Each pod, with the resources it has assigned, can process up to 4 tasks at a time. We have set up an HPA on this deployment, which is based on an external metric of our app (the total number of tasks to be processed at a given time).
The HPA considers that if there is, on average, more than 4 tasks per processing pod, more pods are needed. For example, if there are currently 36 tasks and 9 replicas (an average of 4 tasks/replica, each replica is fully-loaded), and a new task is added (which means 37 tasks for 9 replicas, an average of 4.11 tasks/replica), the HPA will create a new replica.
This mechanism breaks at 40 tasks, because of this globally-set tolerance value. When 40 tasks are running on 10 replicas, the average number of tasks per replica is 4 (which means the HPA will not create more replicas). When adding a new task, for a total of 41 tasks for 10 replicas, we now average 4.1 tasks per replica, which is within the tolerance. The new task will never be picked up by a processing pod.
Because we are deploying our cluster in GKE, we cannot (I believe) configure the global tolerance value ourselves within the kube-controller-manager
. This feature would fix our problem.
We have similar need for a configurable tolerance per HPA We have multiple applications with over 1000 pods and cpu threshold of 70%, only when the cpu utilization reaches 77% (0.1 tolerance) the scale-up action occurs and and more than 100 pods are added to the to the application at once. In an optimal world, we would want the scale-up to be more gradual and start earlier which is possible with reduced toleration.
We don't want to reduce the tolerance for the entire cluster to avoid flapping behaviour in smaller application in which reduced tolerance won't be a good fit.
In case you are working on this issue, I am not even sure that tolerance is the best approach to go here. Instead of having a threshold and tolerance like this:
threshold: 60
tolerance: 0.1
# which will translate to
upscaleAt: 66%
downscaleAt: 54%
Maybe it could be better to directly select the thresholds for upscale and downscale:
threshold: 50
upscaleAt: 52%
downscaleAt: 40%
This way, HPA behaviour will easier to understand and configure
What would you like to be added?
We have a configuration for HPA called
horizontal-pod-autoscaler-tolerance
in Kube Controller Manager, defaulted to0.1
.An individual HPA should be allowed to set a custom value for this, overriding the default value. Sample yaml:
Why is this needed?
K8 Clusters in organizations are often used by multiple applications with different needs. Allowing individual HPA to set their own tolerance limits (and possibly other HPA related config) can help in handling different usecases smoothly.