kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.35k stars 1.05k forks source link

Support custom formula for multiple metrics #2440

Closed or-shachar closed 1 year ago

or-shachar commented 2 years ago

Proposal

I think that a "Composed Scaler" that allows to put in a simple formula that references multiple other metrics - can be super useful.

This would make Keda a powerful tool not only for collecting external metrics but also for making smart decisions based on multiple conditions.

(discussed original here)

Use-Case

Here's a real-world problem: I have a backend process that consumes "write" tasks from a queue and persists them to DB. My DB has a single writer node that is CPU-bound. A task may implicit varying load on the CPU (depending on the content of the write).

I have 2 relevant metrics to determine the replica count:

q := queue-length
cpu := DB-writer CPU utilization

Composed metric formula: ceil((1-cpu) * q) can promise that replicas would grow if the queue is high but avoid growing if CPU is loaded. Right now the only option for me to do it is to create my own service that exposes that metric.

Anything else?

In the original discussion, we started talking about how the ScaledObject spec would look like in the custom logic case.

First I suggested this:

triggers:
  - type: composed
    metadata:
      metrics:
        q_length: 
          type: aws-cloudwatch
          metadata:
            namespace: AWS/SQS
            ... # SQS length
        cpu_util:
          type: aws-cloudwatch
          metadata:
            namespace: AWS/Neptune
            ... # neptune writer CPU
      formula: "ceil((1 - $cpu_util) * $q_length)"
      targetMetricValue: "5"
      minMetricValue: "0"

then @zroubalik suggested this (better IMO):

scaledObject.spec:
  ...
  mutlipleTriggersCalculation:    ### we should find a better name for this probably
      formula: "ceil((1 - $cpu_util) * $q_length)"
      targetMetricValue: "5"
      minMetricValue: "0"
  triggers:
    - type: aws-cloudwatch
       name: q_length     ### name added here
       metadata:
            namespace: AWS/SQS
            ... # SQS length
    - type: aws-cloudwatch
       name: cpu_util       ### name added here
       metadata:
            namespace: AWS/Neptune
            ... # neptune writer CPU

Finally, it's valuable to read @tomkerkhove 's comment about it.

dooferlad commented 2 years ago

Having had a dig through the code, the I would be tempted by implementing this using a Go template. In pkg.provider.KedaProvider.GetExternalMetric to a new CalculateDerivedMetric function that took the above scaledObject inputs would derive the right context object based on triggers and the formula field would be a template, which would be rendered using that context. Probably using https://masterminds.github.io/sprig/ so KEDA doesn't need to re-invent the wheel, you would end up with something like:

scaledObject.spec:
  ...
  derivedMetric: # better name?
      formula: "ceil(mulf(subf(1 .cpu_util) .q_length))"
      targetMetricValue: "5"
      minMetricValue: "0"
  triggers:
    - type: aws-cloudwatch
       name: q_length
       metadata:
            namespace: AWS/SQS
            ... # SQS length
    - type: aws-cloudwatch
       name: cpu_util
       metadata:
            namespace: AWS/Neptune
            ... # neptune writer CPU

The implementation would be rendering that template and converting the resulting string representation into a number to send to the HPA.

joebowbeer commented 2 years ago

Prior discussion: https://github.com/kedacore/keda/issues/373#issuecomment-631487703

neoakris commented 1 year ago

2 use case ideas:
desired count = (target each replica handling N requests per sec) + (additional replicas if HTTP 500 errors show in logs) ^-- can't really do something like that with the current or based logic. It could be helpful if someone got their ideal requests per sec wrong.

It's my understanding that currently if you have multiple triggers they OR and take the largest desired count of the multiple triggers, some people mentioned it'd be useful to have AND logic. This formula might be a good opportunity to implement AND logic to cover those scenarios as well.