kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.43k stars 1.06k forks source link

Performance Degradation while scaling out large number of Deployments, 700<N<1250 #6063

Open jeevantpant opened 2 months ago

jeevantpant commented 2 months ago

Report

We observe performance degradation while scaling out together a large number of deployments, say N, via KEDA. We tested scaling behavior for number of scaledobjects, N = 100,200,500,1000,1500,2000. We expect KEDA to scale deployment replicas from 0-->2 during activation window.

NOTE:

Expected Behavior

- Every HPA object should make a call to the KEDA metricsapi server every 15s by default to fetch metrics starting from the CRON start window time.
- KEDA metricsapi server logs the request made by HPA, and make a call internally to the KEDA operator to get the actual external metric which is observed in the KEDA operator grpc logs.
- Finally the KEDA metricsapi server also logs when the metrics are successfully calculated and exposed by the KEDA operator.
    - Every scaledObject should be reconciled every 30s by KEDA operator.

Actual Behavior

- Few of the HPAs are making a call to the KEDA metricsapi server after 2hr 30mins to fetch metrics after the CRON start window time.
- We see a latency of around 1 min during the external metric generation and exposing by handshake between KEDA operator and the KEDA metricsapi server. 
- We observe pressure in KEDA operator where we see the reconciliation or polling activity by KEDA operator taking >30s every poll.

Steps to Reproduce the Problem

  1. Create the below Scaledobject targeting a simple deployment having one container.

`#Scaleobject.yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: app-scaledobjecttxy-10
  namespace: test-ns
spec:
  scaleTargetRef:
    name: app-deployedtxy-10
  minReplicaCount: 0
  advanced:
    restoreToOriginalReplicaCount: true
  triggers:
    - type: cron
      metadata:
        timezone: Asia/Kolkata
        start: 00 14 * * * # At every 2pm IST
        end: 00 19 * * * # At every 7pm IST
        desiredReplicas: "2"
      name: "cron-sample"

Deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployedtxy-10
  namespace: test-ns
  labels:
    app: app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      name: app
      labels:
        app: app
    spec:
      securityContext:
        runAsUser: 1001
        runAsGroup: 1001
      imagePullSecrets:
        - name: test
      serviceAccount: test
      containers:
        - name: app-cont-tx
          image: test-image
          command: ["/bin/sh"]
          args: ["-c", "while true; do echo $(date -u); sleep 30; done"]
          resources:
            requests:
              memory: "700Mi"
              cpu: "30m"
            limits:
              memory: "700Mi"
              cpu: "30m"

`

  1. We need to create N number of scaledobjects , where number of scaledobjects/deployments in this case N = 1050. (We saw any value between 700 and 1250 was showing this behavior and can be used for reproducing this bug.)
  2. Please make sure there is no resource crunch while scaling and make sure we have enough compute for all 1050 deployments scaling up to 2 each (worker nodes and compute with surplus namespace resourcequota)

Logs from KEDA operator

CRON window timing - Start : 2024-08-05T14:00:00.000+05:30 End : 2024-08-05T19:00:00.000+05:30

We can see the first request logged at 2024-08-05T16:33:19.216+05:30 for a scaled object with issue : app-scaledobjecttxy-10

[keda-operator-reconcile-logs.json](https://github.com/user-attachments/files/16579557/keda-operator-reconcile-logs.json)
[keda-operator-logs.csv](https://github.com/user-attachments/files/16579559/keda-operator-logs.csv)
[keda-metricsapi-server-logs.csv](https://github.com/user-attachments/files/16579560/keda-metricsapi-server-logs.csv)

KEDA Version

2.13.1

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

CRON

Anything else?

No response

deefreak commented 2 months ago

@jeevantpant check if this helps, we were having a similar issue with scale as well.

https://github.com/kedacore/keda/issues/5624

JorTurFer commented 2 months ago

Hello, At scale, there are 2 configurations that can be affecting you, creating the bottleneck:

For the parallel topic, I'd suggest increasing the current value of KEDA_SCALEDOBJECT_CTRL_MAX_RECONCILES 5 to IDK, 20 (and check if it improves and solves, if only improves, increase more) -> https://keda.sh/docs/2.15/operate/cluster/#configure-maxconcurrentreconciles-for-controllers. This will allow more parallel actions reconilling ScaledObjects (if this is the bottleneck)

For the Kubernetes client throttling, you can increase these other paramenters -> https://keda.sh/docs/2.15/operate/cluster/#kubernetes-client-parameters If you are affected by this, you should see messages announcing the rate limit and the waiting time due to it. In this case, I'd recommend increasing them to the double and monitor how it performs, if it's not enough, multiple to the double and check and so on...

JorTurFer commented 2 months ago

There have also been some improvements related with status handling, so upgrading to v2.15 could improve the performance as it reduces significantly the calls to the API server in some cases (if this is the root cause of your case)

jeevantpant commented 4 days ago

Thanks so much @JorTurFer for such insightful suggestions and options for trying out. Your valuable suggestions seemed to have solved our issue which we were facing to scale out the deployments.

I wanted to post the observations and findings when we tried all the above suggestions given by you.

1) After upgrading the keda to v2.15 , we noticed the total time to scaleout all the deployment replicas across all reduce slightly to 50mins (which was previously taking 2hr30mins)

2) Along with the v2.15 upgrade, we also now updated the value of Parallel reconciliations and set KEDA_SCALEDOBJECT_CTRL_MAX_RECONCILES to 20 as per your recommendation, - We observed there was no change to the scaling out time during the cron schedule, which seemed to suggest that allowing more parallel actions and reconciling ScaledObjects might not be the bottleneck.

3) Finally along with v2.15 keda version upgrade, next thing I tried was to update the kubernetes client parameters in the operator . below are the parameter values and the observations : the default for qps was 20 and burst was 30, I tried to maintain the same ratio when increasing. the below is a consistent behavior , (tried multiple fresh installs to confirm behavior over few weeks for each case) a) kube-api-qps: 20 -> 40 / kube-api-burst 30 -> 60 this setting significantly impacted the scaling out time where we noticed that

One final question on the above configuration @JorTurFer , if you could please help us with that .

a) DO you think that the following values set for the kube client parameters: [ kube-api-qps: 60 / kube-api-burst: 90 ] would be of any risk or issue when running on a busier cluster - where there is a significantly more traffic to kube-api-server?

b) Have you ever used these high values before for a live setup OR seen this high numbers for these parameters causing any issues/stress in your experience?

JorTurFer commented 4 days ago

Hello The right values for kube-api-x are a bit more than the minimum value which removes the local throttling messages (KEDA operator logs say that there have been thrilling). About the usage, I know that there are companies running over 3K ScaledObjects in almost realtime, so I think that your scenario could be improved:

About using that high values, I know about clusters configured with 600/900 (and even more for 1 case). They depend on the scaler topology, the amount of failures, etc... I think that in a cluster that already has 1k ScaledObjects, the control plane should be big enough to handle those requests (but monitoring is always a good idea)