Closed theaayushanand closed 4 years ago
I have tested them together it looks like HPA does not scale it up once it has 0 replicas...
We use the StackSetController in Zalando and support for scaling Stacks with HPA was implemented (#93) and released in v20.3.1.
Support for HPAs which are not managed by a Stack is not implemented yet.
Support for HPAs which are not managed by a Stack is not implemented yet.
It should work out-of-the-box with standard HPA. This is the algorithm used by HPA to manage the replica number:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
When currentReplicas is 0, HPA always calculate desiredReplicas = 0, so doesn't scale up. Set replica to 0 is the method to disable HPA.
Since Kube-downscaler sets replica to 0 to downscale, at the same time disable HPA.
When Kube-downscaler put back the original replica number (uptime period) also the HPA will be automatically enabled again.
No any fix or feature should be implemented, kube-downscaler and HPA works fine together.
@mercantiandrea kube-downscaler works fine when the replica count is set to 0. However, in cases where the kube-downscaler replicas is set to non-zero value, tehre is a race condition between kube-downscaler and the HPA. The kube-downscaler scales down the deployment and the HPA scales it back up. There is a constant spawning-and-killing of pods.
Following are the logs from kube-downscaler when the kube-downscaler replicas was set to 1. HPA min was set to 2.
2020-04-07 05:50:47,652 INFO: Scaling down Deployment default/test-downscaler-main-srvr-dply from 2 to 1 replicas (uptime: always, downtime: Mon-Sun 11:20-11:23 Asia/Kolkata)
2020-04-07 05:51:50,144 INFO: Scaling down Deployment default/test-downscaler-main-srvr-dply from 2 to 1 replicas (uptime: always, downtime: Mon-Sun 11:20-11:23 Asia/Kolkata)
2020-04-07 05:52:52,630 INFO: Scaling down Deployment default/test-downscaler-main-srvr-dply from 2 to 1 replicas (uptime: always, downtime: Mon-Sun 11:20-11:23 Asia/Kolkata)
Following are the deployment events which shows that there was a continuous upscaling and downscaling
Normal ScalingReplicaSet 14m (x5 over 28m) deployment-controller Scaled down replica set test-downscaler-main-srvr-dply-cd7d84bf4 to 1
Normal ScalingReplicaSet 13m (x7 over 40m) deployment-controller Scaled up replica set test-downscaler-main-srvr-dply-cd7d84bf4 to 2
One possible solution I see to this is, setting the HPA min count to the kube-downscaler replica count during downtime. Add an annotation "downscaler/original-replicas" which captures the original min replica count. When the downtime is over, just like deployment replicas are restored back to original values, the HPA min count can be restored back to the original min replica count.
Yep, for sure. Indeed my analysis start from the point that the downscaler scale to 0, the unique value that disable the HPA due to its algorithm to calculate the desired state:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
If you put 0 in currentReplicas, you will have always 0 in desiredReplicas.
For what you are looking for it's not necessary the kube-downscaler but you have to esclude the deployment from the downscaling and set the right metric in you HPA in order to scale to 1 when it is not used.
I'm fine to support some feature to support HPA with non-zero downscale replicas. Any solution proposals on how kube-downscaler can better work together in these cases?
I think that kube-downscaler could also read the HPA and modify the min-max replica values. In this case there is no necessity to modify the normal behaviour of kube-downscaler.
And write the original values of HPA min/max values in the deployment in order to have only one point of trust.
Based on my understanding, I've created a PR (https://github.com/hjacobs/kube-downscaler/pull/98).
@hjacobs For me with non zero minimum replicas and version v20.5.0
experiencing the behavior described here.
In my case seems to always happen a clash between dowscaler and HPAs which causes more utilization during downtime period. Sharing example logs:
May 20, 2020 @ 08:30:24.688 2020-05-20 05:30:24,687 INFO: Scaling down Deployment default/random1 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:30:24.669 2020-05-20 05:30:24,669 INFO: Scaling down Deployment default/random2 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:30:24.652 2020-05-20 05:30:24,652 INFO: Scaling down Deployment default/random3 from 3 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:30:24.635 2020-05-20 05:30:24,635 INFO: Scaling down Deployment default/random4 from 3 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:30:24.492 2020-05-20 05:30:24,492 INFO: Downscaler v20.5.0 started with debug=False, default_downtime=never, default_uptime=Mon-Fri 05:31-20:00 UTC, deployment_time_annotation=None, downscale_period=never, downtime_replicas=0, dry_run=False, exclude_deployments=kube-downscaler,downscaler, exclude_namespaces=kube-system, grace_period=900, include_resources=deployments, interval=60, namespace=None, once=False, upscale_period=never
May 20, 2020 @ 08:29:51.824 2020-05-20 05:29:51,824 INFO: Scaling down Deployment pph-system/random5 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:29:51.767 2020-05-20 05:29:51,766 INFO: Scaling down Deployment default/random6 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:29:51.751 2020-05-20 05:29:51,750 INFO: Scaling down Deployment default/random7 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:29:51.734 2020-05-20 05:29:51,734 INFO: Scaling down Deployment default/random1 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:29:51.723 2020-05-20 05:29:51,723 INFO: Scaling down Deployment default/random2 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
May 20, 2020 @ 08:29:51.709 2020-05-20 05:29:51,709 INFO: Scaling down Deployment default/random3 from 3 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
An example hpa configuration is the following:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
annotations:
downscaler/downtime-replicas: "1"
name: random3
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: random3
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 100
- type: Pods
pods:
metric:
name: phpfpm_active_processes
target:
type: AverageValue
averageValue: 12
Results of over-utilization during downscale period.
In my understanding is that a set of max replicas to be same as min might work for my use case. For a more generic case though a more proper implementation could be to support min_replicas
, max_replicas
annotations for HPAs.
@hjacobs Are there any objections to propose a PR that sets same value for min and max replicas, as probably is the most common case for anyone uses kube-downscaler
?
@stafot - Could you confirm that you have not added any kube-downscaler annotation on the deployment itself? Because that would cause a race condition as well. If no annotations were added to the Deployment, could you share the corresponding events from Deployment when it's scaling up?
@vishaltak Hmm, Oh I got this wrong I have annotations on both HPAs and Deployments should I keep only HPAs?
Yes. Add the annotation only on the HPA if non-zero scale is required. If zero-scale is required, add annotation on Deployment.
From the documentation -
Note that in cases where a HorizontalPodAutoscaler (HPA) is used along with Deployments, consider the following:
If downscale to 0 replicas is desired, the annotation should be applied on the Deployment. This is a special case, since minReplicas of 0 on HPA is not allowed. Setting Deployment replicas to 0 essentially disables the HPA. In such a case, the HPA will emit events like "failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API" as there is no Pod to retrieve metrics from. If downscale greater than 0 is desired, the annotation should be applied on the HPA. This allows for dynamic scaling of the Pods even during downtime based upon the external traffic as well as maintain a lower minReplicas during downtime if there is no/low traffic. If the Deployment is annotated instead of the HPA, it leads to a race condition where kube-downscaler scales down the Deployment and HPA upscales it as its minReplicas is higher.
@vishaltak Changed my manifests according your suggestions. I 'll test today and come back.
@vishaltak
After removing deployments' annotations and keep only on HPAs, behavior is not the expected.
All services that have hpa go to zero replicas.
So probably need other to rmove HPAs during the downscale period and keep deployments' annotations or refactor the kube-downscaler
the way I proposed here.
@stafot - That's strange. I'm still trying to understand why your deployments are scaled to 0. The situation you're describing above normally occurs when the deployment has an annotation to 0 (and if HPA has annotations as well, it will cause race and if there are no annotations on HPA, it will do nothing). Could you provide the deployment and HPA YAMLs?
On a different note, +1 for the proposal for mixReplicas and maxReplicas on HPA.
@vishaltak
apiVersion: apps/v1
kind: Deployment
metadata:
name: random3
labels:
app: random3
purpose: random3
spec:
replicas: 2
revisionHistoryLimit: 0
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
# Select which pods this deployment applies to. e.g. which pods to scale down when there's a change.
selector:
matchLabels:
app: random3
tier: web
template:
metadata:
# These are the labels that pods will be given when created via this deployment
labels:
app: random3
service: random3
framework: yii
purpose: random3
version: random.482
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: version
operator: In
values:
- random.482
topologyKey: "kubernetes.io/hostname"
containers:
- name: app
image: random/randomfpm:7.3
imagePullPolicy: IfNotPresent
readinessProbe:
httpGet:
path: /randomhealth
port: 80
initialDelaySeconds: 1
periodSeconds: 10
volumeMounts:
- name: app-volume
mountPath: /var/www/pph
- name: app-socket
mountPath: /sock
resources:
requests:
cpu: "300m"
memory: "200Mi"
limits:
cpu: "1"
memory: "1Gi"
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
- name: filebeat
image: docker.elastic.co/beats/filebeat:7.4.0
args: [
"-c", "/etc/filebeat.yml",
"-e",
]
imagePullPolicy: IfNotPresent
volumeMounts:
- name: filebeat-config-volume
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
resources:
requests:
cpu: "5m"
memory: "10Mi"
limits:
cpu: "60m"
memory: "64Mi"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
- name: nginx
image: random/random_nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
volumeMounts:
- name: app-socket
mountPath: /sock
resources:
requests:
cpu: "100m"
memory: "32Mi"
limits:
cpu: "500m"
memory: "96Mi"
readinessProbe:
httpGet:
path: /nginx_ready
port: 80
initialDelaySeconds: 1
periodSeconds: 10
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
- name: fpm-exporter
image: random/randomfpm_exporter
args: ["--phpfpm.scrape-uri", "unix:///sock/php.sock;/fpm_status", "--phpfpm.fix-process-count", "true"]
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: "40m"
memory: "16Mi"
limits:
cpu: "150m"
memory: "32Mi"
ports:
- containerPort: 9253
volumeMounts:
- name: app-socket
mountPath: /sock
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
volumes:
- name: app-volume
emptyDir: {}
# For a unix socket so nginx and fpm can communicate
- name: app-socket
emptyDir: {}
A sample hpa manifest is on previous comment and a sample deployment is above.
An idea that I had is that might related with the apiVersion
you are using. Because Deployment
resource type is implemented in apps
and in extensions
APIs. But I didn't check your code carefully yet. If you have any intuition on what is going wrong, let me know.
So TL; DR: If I add the annotations on both places cluster misbehaves totally and upscale instead of downscale. When removed them as you proposed, again the behavior is not the expected but instead all replicas on all services that have been declared through HPAs go to zero.
@stafot So when you use the annotation on HPA with original minReplicas as 2, during downtime, it downscales to 1, which is correct and expected behaviour (infered this from the HPA definition provided and the screenshot in the later comment). But at the same time, your Deployment is scaled to 0 while it should've been 1. And there are no additional annotations on the Deployment
While I'm clueless at this point what can be the issue, could you provide your kubernetes version? I might take a look at it and try to replicate it over the weekend.
On a different note -
@stafot - Could you confirm that you have not added any kube-downscaler annotation on the deployment itself? Because that would cause a race condition as well. If no annotations were added to the Deployment, could you share the corresponding events from Deployment when it's scaling up?
Point of clarification - It will only be a race condition if the annotations on the Deployment and the HPA mismatch. If both have same value, although redundant, it should not create a race condition.
Point of clarification - It will only be a race condition if the annotations on the Deployment and the HPA mismatch. If both have same value, although redundant, it should not create a race
@vishaltak they had the same value.
While I'm clueless at this point what can be the issue, could you provide your kubernetes version? I might take a look at it and try to replicate it over the weekend.
My k8s version is 1.16. And I am using managed AWS EKS
cluster
I want to understand how will this work with horizontal pod autoscaler as we already have it deployed on our environment. from what i understand this changes deployment number and hpa also works with min and max number of deployments so wont the autoscaler override the changes done by downscaler at any time ? is there a way to make it work with autoscaler or are these supposed to be exclusive of each other.