hjacobs / kube-downscaler

Scale down Kubernetes deployments after work hours
https://hub.docker.com/r/hjacobs/kube-downscaler
GNU General Public License v3.0
528 stars 91 forks source link

kube-downscaler with hpa #69

Closed theaayushanand closed 4 years ago

theaayushanand commented 5 years ago

I want to understand how will this work with horizontal pod autoscaler as we already have it deployed on our environment. from what i understand this changes deployment number and hpa also works with min and max number of deployments so wont the autoscaler override the changes done by downscaler at any time ? is there a way to make it work with autoscaler or are these supposed to be exclusive of each other.

runningman84 commented 4 years ago

I have tested them together it looks like HPA does not scale it up once it has 0 replicas...

hjacobs commented 4 years ago

We use the StackSetController in Zalando and support for scaling Stacks with HPA was implemented (#93) and released in v20.3.1.

Support for HPAs which are not managed by a Stack is not implemented yet.

mercantiandrea commented 4 years ago

Support for HPAs which are not managed by a Stack is not implemented yet.

It should work out-of-the-box with standard HPA. This is the algorithm used by HPA to manage the replica number: desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] When currentReplicas is 0, HPA always calculate desiredReplicas = 0, so doesn't scale up. Set replica to 0 is the method to disable HPA. Since Kube-downscaler sets replica to 0 to downscale, at the same time disable HPA. When Kube-downscaler put back the original replica number (uptime period) also the HPA will be automatically enabled again.

No any fix or feature should be implemented, kube-downscaler and HPA works fine together.

vishaltak commented 4 years ago

@mercantiandrea kube-downscaler works fine when the replica count is set to 0. However, in cases where the kube-downscaler replicas is set to non-zero value, tehre is a race condition between kube-downscaler and the HPA. The kube-downscaler scales down the deployment and the HPA scales it back up. There is a constant spawning-and-killing of pods.

Following are the logs from kube-downscaler when the kube-downscaler replicas was set to 1. HPA min was set to 2.

2020-04-07 05:50:47,652 INFO: Scaling down Deployment default/test-downscaler-main-srvr-dply from 2 to 1 replicas (uptime: always, downtime: Mon-Sun 11:20-11:23 Asia/Kolkata)
2020-04-07 05:51:50,144 INFO: Scaling down Deployment default/test-downscaler-main-srvr-dply from 2 to 1 replicas (uptime: always, downtime: Mon-Sun 11:20-11:23 Asia/Kolkata)
2020-04-07 05:52:52,630 INFO: Scaling down Deployment default/test-downscaler-main-srvr-dply from 2 to 1 replicas (uptime: always, downtime: Mon-Sun 11:20-11:23 Asia/Kolkata)

Following are the deployment events which shows that there was a continuous upscaling and downscaling

  Normal  ScalingReplicaSet  14m (x5 over 28m)  deployment-controller  Scaled down replica set test-downscaler-main-srvr-dply-cd7d84bf4 to 1
  Normal  ScalingReplicaSet  13m (x7 over 40m)  deployment-controller  Scaled up replica set test-downscaler-main-srvr-dply-cd7d84bf4 to 2
vishaltak commented 4 years ago

One possible solution I see to this is, setting the HPA min count to the kube-downscaler replica count during downtime. Add an annotation "downscaler/original-replicas" which captures the original min replica count. When the downtime is over, just like deployment replicas are restored back to original values, the HPA min count can be restored back to the original min replica count.

mercantiandrea commented 4 years ago

Yep, for sure. Indeed my analysis start from the point that the downscaler scale to 0, the unique value that disable the HPA due to its algorithm to calculate the desired state: desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] If you put 0 in currentReplicas, you will have always 0 in desiredReplicas.

For what you are looking for it's not necessary the kube-downscaler but you have to esclude the deployment from the downscaling and set the right metric in you HPA in order to scale to 1 when it is not used.

hjacobs commented 4 years ago

I'm fine to support some feature to support HPA with non-zero downscale replicas. Any solution proposals on how kube-downscaler can better work together in these cases?

mercantiandrea commented 4 years ago

I think that kube-downscaler could also read the HPA and modify the min-max replica values. In this case there is no necessity to modify the normal behaviour of kube-downscaler.

mercantiandrea commented 4 years ago

And write the original values of HPA min/max values in the deployment in order to have only one point of trust.

vishaltak commented 4 years ago

Based on my understanding, I've created a PR (https://github.com/hjacobs/kube-downscaler/pull/98).

hjacobs commented 4 years ago

98 was merged and released in https://github.com/hjacobs/kube-downscaler/releases/tag/20.4.2

stafot commented 4 years ago

@hjacobs For me with non zero minimum replicas and version v20.5.0 experiencing the behavior described here.

stafot commented 4 years ago

In my case seems to always happen a clash between dowscaler and HPAs which causes more utilization during downtime period. Sharing example logs:

May 20, 2020 @ 08:30:24.688 2020-05-20 05:30:24,687 INFO: Scaling down Deployment default/random1 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:30:24.669 2020-05-20 05:30:24,669 INFO: Scaling down Deployment default/random2 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:30:24.652 2020-05-20 05:30:24,652 INFO: Scaling down Deployment default/random3 from 3 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:30:24.635 2020-05-20 05:30:24,635 INFO: Scaling down Deployment default/random4 from 3 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:30:24.492 2020-05-20 05:30:24,492 INFO: Downscaler v20.5.0 started with debug=False, default_downtime=never, default_uptime=Mon-Fri 05:31-20:00 UTC, deployment_time_annotation=None, downscale_period=never, downtime_replicas=0, dry_run=False, exclude_deployments=kube-downscaler,downscaler, exclude_namespaces=kube-system, grace_period=900, include_resources=deployments, interval=60, namespace=None, once=False, upscale_period=never
    May 20, 2020 @ 08:29:51.824 2020-05-20 05:29:51,824 INFO: Scaling down Deployment pph-system/random5 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:29:51.767 2020-05-20 05:29:51,766 INFO: Scaling down Deployment default/random6 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:29:51.751 2020-05-20 05:29:51,750 INFO: Scaling down Deployment default/random7 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:29:51.734 2020-05-20 05:29:51,734 INFO: Scaling down Deployment default/random1 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:29:51.723 2020-05-20 05:29:51,723 INFO: Scaling down Deployment default/random2 from 2 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)
    May 20, 2020 @ 08:29:51.709 2020-05-20 05:29:51,709 INFO: Scaling down Deployment default/random3 from 3 to 1 replicas (uptime: Mon-Fri 05:31-20:00 UTC, downtime: never)

An example hpa configuration is the following:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    downscaler/downtime-replicas: "1"
  name: random3
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: random3
  minReplicas: 2
  maxReplicas: 10
  metrics:

  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 100

  - type: Pods
    pods:
      metric:
        name: phpfpm_active_processes
      target:
        type: AverageValue
        averageValue: 12

Results of over-utilization during downscale period. overutilization

In my understanding is that a set of max replicas to be same as min might work for my use case. For a more generic case though a more proper implementation could be to support min_replicas, max_replicas annotations for HPAs. @hjacobs Are there any objections to propose a PR that sets same value for min and max replicas, as probably is the most common case for anyone uses kube-downscaler?

vishaltak commented 4 years ago

@stafot - Could you confirm that you have not added any kube-downscaler annotation on the deployment itself? Because that would cause a race condition as well. If no annotations were added to the Deployment, could you share the corresponding events from Deployment when it's scaling up?

stafot commented 4 years ago

@vishaltak Hmm, Oh I got this wrong I have annotations on both HPAs and Deployments should I keep only HPAs?

vishaltak commented 4 years ago

Yes. Add the annotation only on the HPA if non-zero scale is required. If zero-scale is required, add annotation on Deployment.

From the documentation -

Note that in cases where a HorizontalPodAutoscaler (HPA) is used along with Deployments, consider the following:

If downscale to 0 replicas is desired, the annotation should be applied on the Deployment. This is a special case, since minReplicas of 0 on HPA is not allowed. Setting Deployment replicas to 0 essentially disables the HPA. In such a case, the HPA will emit events like "failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API" as there is no Pod to retrieve metrics from. If downscale greater than 0 is desired, the annotation should be applied on the HPA. This allows for dynamic scaling of the Pods even during downtime based upon the external traffic as well as maintain a lower minReplicas during downtime if there is no/low traffic. If the Deployment is annotated instead of the HPA, it leads to a race condition where kube-downscaler scales down the Deployment and HPA upscales it as its minReplicas is higher.

stafot commented 4 years ago

@vishaltak Changed my manifests according your suggestions. I 'll test today and come back.

stafot commented 4 years ago

@vishaltak After removing deployments' annotations and keep only on HPAs, behavior is not the expected. kube-downscaler-not-as-expected All services that have hpa go to zero replicas. So probably need other to rmove HPAs during the downscale period and keep deployments' annotations or refactor the kube-downscaler the way I proposed here.

vishaltak commented 4 years ago

@stafot - That's strange. I'm still trying to understand why your deployments are scaled to 0. The situation you're describing above normally occurs when the deployment has an annotation to 0 (and if HPA has annotations as well, it will cause race and if there are no annotations on HPA, it will do nothing). Could you provide the deployment and HPA YAMLs?

On a different note, +1 for the proposal for mixReplicas and maxReplicas on HPA.

stafot commented 4 years ago

@vishaltak

apiVersion: apps/v1
kind: Deployment
metadata:
  name: random3
  labels:
    app: random3
    purpose: random3
spec:
  replicas: 2 
  revisionHistoryLimit: 0
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

  # Select which pods this deployment applies to. e.g. which pods to scale down when there's a change.
  selector:
    matchLabels:
      app: random3
      tier: web

  template:
    metadata:
      # These are the labels that pods will be given when created via this deployment
      labels:
        app: random3
        service: random3
        framework: yii
        purpose: random3
        version: random.482

    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: version
                operator: In
                values:
                - random.482
            topologyKey: "kubernetes.io/hostname"

      containers:

      - name: app
        image: random/randomfpm:7.3
        imagePullPolicy: IfNotPresent
        readinessProbe:
          httpGet:
            path: /randomhealth
            port: 80
          initialDelaySeconds: 1
          periodSeconds: 10
        volumeMounts:
        - name: app-volume
          mountPath: /var/www/pph
        - name: app-socket
          mountPath: /sock
        resources:
          requests:
            cpu: "300m"
            memory: "200Mi"
          limits:
            cpu: "1"
            memory: "1Gi"
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]

      - name: filebeat
        image: docker.elastic.co/beats/filebeat:7.4.0
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - name: filebeat-config-volume
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        resources:
          requests:
            cpu: "5m"
            memory: "10Mi"
          limits:
            cpu: "60m"
            memory: "64Mi"
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]

      - name: nginx
        image: random/random_nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
        volumeMounts:
        - name: app-socket
          mountPath: /sock
        resources:
          requests:
            cpu: "100m"
            memory: "32Mi"
          limits:
            cpu: "500m"
            memory: "96Mi"
        readinessProbe:
          httpGet:
            path: /nginx_ready
            port: 80
          initialDelaySeconds: 1
          periodSeconds: 10
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]

      - name: fpm-exporter
        image: random/randomfpm_exporter
        args: ["--phpfpm.scrape-uri", "unix:///sock/php.sock;/fpm_status", "--phpfpm.fix-process-count", "true"]
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: "40m"
            memory: "16Mi"
          limits:
            cpu: "150m"
            memory: "32Mi"
        ports:
          - containerPort: 9253
        volumeMounts:
          - name: app-socket
            mountPath: /sock
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
      volumes:
      - name: app-volume
        emptyDir: {}
      # For a unix socket so nginx and fpm can communicate
      - name: app-socket
        emptyDir: {}

A sample hpa manifest is on previous comment and a sample deployment is above. An idea that I had is that might related with the apiVersion you are using. Because Deployment resource type is implemented in apps and in extensionsAPIs. But I didn't check your code carefully yet. If you have any intuition on what is going wrong, let me know. So TL; DR: If I add the annotations on both places cluster misbehaves totally and upscale instead of downscale. When removed them as you proposed, again the behavior is not the expected but instead all replicas on all services that have been declared through HPAs go to zero.

vishaltak commented 4 years ago

@stafot So when you use the annotation on HPA with original minReplicas as 2, during downtime, it downscales to 1, which is correct and expected behaviour (infered this from the HPA definition provided and the screenshot in the later comment). But at the same time, your Deployment is scaled to 0 while it should've been 1. And there are no additional annotations on the Deployment

While I'm clueless at this point what can be the issue, could you provide your kubernetes version? I might take a look at it and try to replicate it over the weekend.

On a different note -

@stafot - Could you confirm that you have not added any kube-downscaler annotation on the deployment itself? Because that would cause a race condition as well. If no annotations were added to the Deployment, could you share the corresponding events from Deployment when it's scaling up?

Point of clarification - It will only be a race condition if the annotations on the Deployment and the HPA mismatch. If both have same value, although redundant, it should not create a race condition.

stafot commented 4 years ago

Point of clarification - It will only be a race condition if the annotations on the Deployment and the HPA mismatch. If both have same value, although redundant, it should not create a race

@vishaltak they had the same value.

While I'm clueless at this point what can be the issue, could you provide your kubernetes version? I might take a look at it and try to replicate it over the weekend.

My k8s version is 1.16. And I am using managed AWS EKS cluster