Proposal to integrate AKS's Carbon-Aware-Scaler in to KEDA

qpetraroia commented 1 year ago

Introduction

Today, Microsoft announced an open-source way to scale your workloads based on carbon-intensity with KEDA and the green software foundations SDK. This was built on top of earlier learnings and POCs with the KEDA team and other open-source contributors. Below you can find the open-source repository:

https://github.com/Azure/carbon-aware-keda-operator

The above repository provides a Kubernetes operator that aims to reduce carbon emissions by helping KEDA scale Kubernetes workloads based on carbon intensity. Carbon intensity is a measure of how much carbon dioxide is emitted per unit of energy consumed. By scaling workloads according to the carbon intensity of the region or grid where they run, we can optimize the carbon efficiency and environmental impact of our applications.

This operator can use carbon intensity data from third party sources such as WattTime, Electricity Map or any other provider, to dynamically adjust the scaling behavior of KEDA. The operator does not require any application or workload code change, and it works with any KEDA scaler.

With the sustainability conversation started in the Kubernetes space, we are now looking forward to partner with KEDA to officially bring our code into KEDA and work with the KEDA team to build out the official carbon-aware scaler.

Proposal

Our proposal is to work with KEDA team to build out an official KEDA carbon-aware scaler and bring it into the open-source KEDA project. This could either be built on top of our existing repository by donating it to KEDA or by starting a new scaler.

Use Cases

Use cases for the operator include low priority and time flexible workloads that support interruptions in dev/test environments. Some examples of these are non-critical data backups, batch processing jobs, data analytics processing, and ML training jobs.

Scaler Source

Carbon intensity data via the GSF SDK or a cloud provider.

Scaling Mechanics

Scale based on carbon intensity via the GSF SDK or a cloud provider providing carbon intensity data. Microsoft has provided an open-source example of this here.

Authentication Source

Through the GSF or a cloud provider.

And special thanks to @yelghali, @pauldotyu, @tomkerkhove, @helayoty and @Fei-Guo! Appreciate all your hard work :)

tomkerkhove commented 1 year ago

Worth nothing that I work for Microsoft which could come across as biased but wearing KEDA hat here

For reference, here is an overview from AKS' operator and how it works:

I'm very excited to see this new operator that productizes the POC we did in collaboration with AKS & TAG Environmental Sustainability along with @husky-parul, @rootfs, @yelghali & @zroubalik!

One of the main take-aways was that fetching the data is the main problem and we need to find a unified way of gathering them.

A second learning was that this actually cannot be a new scaler, but needs to be something that influences the min/max replicas that it is allowed to scale to while the scalers still define when scaling actions are required.

So I think the next step for KEDA as a runtime is to see how we can formalize the second learning where cluster operators or platform teams can use a new CRD that basically defines how far a ScaledObject/ScaledJob can scale out. This is not only valuable for this use-case, but maybe an app dev wants to scale to 1000 replicas while the cluster operator says "no!" and wants to overrule that.

With this new model, however, we need to have a flexible way of influencing this. While the AKS operator has this as a fixed CRD that fits the needs, we might want to be more generic and open for future new "providers" or scenarios. For example, there was another reported scenario where people want to override it. With that I don't mean that we need to shove everything in 1 CRD which is generic; because end-user experience is essential here so we might need to align with a model similar to ingress/gateway API where providers can register itself with KEDA and that every provider has separate CRDs to define the criteria for them. (it depends how we want to approach things.

As an example, this is tailored to the needs of carbon scaling and SME's know exactly what to provide:

apiVersion: carbonaware.kubernetes.azure.com/v1alpha1 
kind: CarbonAwareKedaScaler 
metadata: 
  name: carbon-aware-word-processor-scaler
spec: 
  kedaTarget: scaledobjects.keda.sh        # can be used for ScaledObjects & ScaledJobs
  kedaTargetRef: 
    name: word-processor-scaler
    namespace: default 
  carbonIntensityForecastDataSource:       # carbon intensity forecast data source 
    mockCarbonForecast: false              # [OPTIONAL] use mock carbon forecast data 
    localConfigMap:                        # [OPTIONAL] use configmap for carbon forecast data 
      name: carbon-intensity 
      namespace: kube-system
      key: data 
  maxReplicasByCarbonIntensity:            # array of carbon intensity values in ascending order; each threshold value represents the upper limit and previous entry represents lower limit 
    - carbonIntensityThreshold: 437        # when carbon intensity is 437 or below 
      maxReplicas: 110                     # do more 
    - carbonIntensityThreshold: 504        # when carbon intensity is >437 and <=504 
      maxReplicas: 60 
    - carbonIntensityThreshold: 571        # when carbon intensity is >504 and <=571 (and beyond) 
      maxReplicas: 10                      # do less 
  ecoModeOff:                              # [OPTIONAL] settings to override carbon awareness; can override based on high intensity duration or schedules 
    maxReplicas: 100                       # when carbon awareness is disabled, use this value 
    carbonIntensityDuration:               # [OPTIONAL] disable carbon awareness when carbon intensity is high for this length of time 
      carbonIntensityThreshold: 555        # when carbon intensity is equal to or above this value, consider it high 
      overrideEcoAfterDurationInMins: 45   # if carbon intensity is high for this many hours disable ecomode 
    customSchedule:                        # [OPTIONAL] disable carbon awareness during specified time periods 
      - startTime: "2023-04-28T16:45:00Z"  # start time in UTC 
        endTime: "2023-04-28T17:00:59Z"    # end time in UTC 
    recurringSchedule:                     # [OPTIONAL] disable carbon awareness during specified recurring time periods 
      - "* 23 * * 1-5"                     # disable every weekday from 11pm to 12am UTC

If we have to move this to a generic CRD we might lose that focus or that CRD might become gigantic. Hence why a provider approach might be ideal where top-level CRD has a pointer to a more specific CRD to define the details as above.

Another aspect to keep in mind is scoping. Let's say we introduce a new CRD for this, what would the scope of it be? A single ScaledObject, based on label filtering, based on a namespace, ...?

clemlesne commented 1 year ago

Be warning of the quality of service, like mentioned here: https://github.com/kedacore/keda/issues/3467#issuecomment-1514536859.

It would seem more relevant to me to apply a relative decline to the scaling rule, not in Absolute replica count (Pod) but in Relative replica count (%).

Imagine the following:

In that example:

The quality of service is maintained, as the scaled resource can scale indefinitely is the triggers requires it.

The carbon impact is limited, as the resource consumption is reduced in the period the carbon impact is high.
spec:
  ...
  environmentalImpact:
    carbon:
      - measuredIntensity: 400
        reducedReplicaPercent: 50%
      - measuredIntensity: 200
        reducedReplicaPercent: 25%
      - measuredIntensity: 50
        reducedReplicaPercent: 10%
  triggers:
    ...

yelghali commented 1 year ago

Be warning of the quality of service, like mentioned here: #3467 (comment).
It would seem more relevant to me to apply a relative decline to the scaling rule, not in Absolute replica count (Pod) but in Relative replica count (%). Imagine the following: In that example:

The quality of service is maintained, as the scaled resource can scale indefinitely is the triggers requires it.

The carbon impact is limited, as the resource consumption is reduced in the period the carbon impact is high.
spec:
  ...
  environmentalImpact:
    carbon:
      - measuredIntensity: 400
        reducedReplicaPercent: 50%
      - measuredIntensity: 200
        reducedReplicaPercent: 25%
      - measuredIntensity: 50
        reducedReplicaPercent: 10%
  triggers:
    ...

thanks @clemlesne indeed reducing actual replicas would have more impact, indeed, however we found that having the operator updates replicas would be a bit dangerous, as it would conflict with KEDA/HPA, so the updating maxReplicas is less intrusive, and in practice, would prevent bursting or using more compute during high carbon intensity times.

also it's important to note that this operator is meant to be used with low priority and time flexible workloads, that support interruptions, https://github.com/Azure/carbon-aware-keda-operator/blob/main/README.md

JorTurFer commented 1 year ago

WDYT @kedacore/keda-maintainers ?

clemlesne commented 1 year ago

thanks @clemlesne indeed reducing actual replicas would have more impact

Hard-limiting the number of replicas will add a burden to the SRE teams, and goes against serverless principles.

Imagine a startup with unpredictable load using Serverless. I don't think anyone at the company will take the risk of seeing, after a TV / Instagram ad, new leads not served.
Same in big companies, the argument for the cloud is the elasticity and the shared use of the resources.

indeed, however we found that having the operator updates replicas would be a bit dangerous, as it would conflict with KEDA/HPA, so the updating maxReplicas is less intrusive, and in practice, would prevent bursting or using more compute during high carbon intensity times.

I didn't understand everything, as you mentioned multiple topics.

Here are my thoughts:

---
title: KEDA with carbon limiter and relative-caping
---
flowchart LR
    trigger1["Trigger #1"]
    trigger2["Trigger #2"]
    carbonLimiter["Carbon limiter"]
    scalingRule["Scaling rule"]
    kubeHpa["k8s HPA"]

    trigger1 --> scalingRule
    trigger2 --> scalingRule
    carbonLimiter --> scalingRule
    scalingRule --> kubeHpa

In that case, Scaling rule does:

Sources of all the triggers
Compute the desired scaling
Apply carbon limiter ratio (replicas = [triggers, ...] * [carbon limiter in %])
Apply the scaling to the HPA

also it's important to note that this operator is meant to be used with low priority and time flexible workloads, that support interruptions, https://github.com/Azure/carbon-aware-keda-operator/blob/main/README.md

I understand. It could solve the hard-caping problem. This complexity, I think, is not necessary with the relative-caping.

JorTurFer commented 1 year ago

Kindly reminder @kedacore/keda-maintainers

tomkerkhove commented 1 year ago

I'm not sure I like the approach mentioned here though:

---
title: KEDA with carbon limiter and relative-caping
---
flowchart LR
    trigger1["Trigger #1"]
    trigger2["Trigger #2"]
    carbonLimiter["Carbon limiter"]
    scalingRule["Scaling rule"]
    kubeHpa["k8s HPA"]

    trigger1 --> scalingRule
    trigger2 --> scalingRule
    carbonLimiter --> scalingRule
    scalingRule --> kubeHpa
In that case, Scaling rule does:

Sources of all the triggers

Compute the desired scaling

Apply carbon limiter ratio (replicas = [triggers, ...] * [carbon limiter in %])

Apply the scaling to the HPA

I still believe that adding it to scaledobject/scaledjob is not the right place, assuming that is what "Scaling rule" represents.

Otherwise, I think this should be a 1st class feature in KEDA :)

pauldotyu commented 1 year ago

A few things that I like from this discussion:

I still believe that adding it to scaledobject/scaledjob is not the right place, assuming that is what "Scaling rule" represents.

I also agree that this should not be baked into the scaling rule logic itself, but rather raise or lower the ceiling on how far a workload can scale (with min/max replicas)

Hard-limiting the number of replicas will add a burden to the SRE teams, and goes against serverless principles.

Totally understand the position here and the scenarios mentioned. It is important to re-affirm that carbon-aware scaling should not be applied for time and/or demand sensitive workloads. So if the workload needs to accommodate high demand and within a reasonable amount of time, IMO it should not be a "carbon-aware" workload.

It would seem more relevant to me to apply a relative decline to the scaling rule, not in Absolute replica count (Pod) but in Relative replica count (%).

I like the idea of relative reduction in replica count using percentages, but that may require additional lookup on the actual ScaledObject/ScaledJob to see what that max value is to know how high a workload would scale out to or scale down to. Do you think supporting both options (% and actual replica counts) is warranted here?

With that I don't mean that we need to shove everything in 1 CRD which is generic; because end-user experience is essential here so we might need to align with a model similar to ingress/gateway API where providers can register itself with KEDA and that every provider has separate CRDs to define the criteria for them. (it depends how we want to approach things.

Splitting into multiple CRDs sounds good to me, but what is your vision of a provider here? Could this be a provider of carbon-intensity data?

Let's say we introduce a new CRD for this, what would the scope of it be? A single ScaledObject, based on label filtering, based on a namespace

All of the above :-) Just kidding, my thought it to keep it narrowly scoped to a single ScaledObject in a single namespace to reduce the chances that it reduces the scalability a workload that was not intended to be "carbon aware"

tomkerkhove commented 1 year ago

I like the idea of relative reduction in replica count using percentages, but that may require additional lookup on the actual ScaledObject/ScaledJob to see what that max value is to know how high a workload would scale out to or scale down to. Do you think supporting both options (% and actual replica counts) is warranted here?

Exactly. I think we can support both % wise and number-wise, up to the end-user to choose which model is best for them.

Could this be a provider of carbon-intensity data?

Yes, this is what I am thinking indeed.

Just kidding, my thought it to keep it narrowly scoped to a single ScaledObject in a single namespace to reduce the chances that it reduces the scalability a workload that was not intended to be "carbon aware"

I think this is another case of providing option where you can make the argument that you want to scope to single workload, or use labels for large-scale scenarios.

zroubalik commented 6 months ago

I am very sorry for the delay on this. I would like to restart the conversation.

I am 100% for integrating this into KEDA project, it shouldn't be part of the core though, it's a nice extension. The only thing that we need to solve from my pov is maintainership.

tomkerkhove commented 6 months ago

+1

pauldotyu commented 6 months ago

I am very sorry for the delay on this. I would like to restart the conversation.

I am 100% for integrating this into KEDA project, it shouldn't be part of the core though, it's a nice extension. The only thing that we need to solve from my pov is maintainership.

Awesome! I've been meaning to jump back into this myself. As for maintainership... with a little help, I'd be happy to do that!

zroubalik commented 6 months ago

@tomkerkhove do you want to drive this from KEDA perspective?

tomkerkhove commented 6 months ago

Will check with Paul in a couple of weeks

tomkerkhove commented 3 months ago

I have been discussing this a bit with @zroubalik and @JorTurFer and summarizing things here:

We want to start with just supporting Microsoft Azure given we have a working approach and direct contacts with the vendor
- We have evaluated support all major cloud providers from the get-go but this will delay things, certainly given there is no open standard for defining carbon information nor how to get it from various cloud providers
- We will use alpha API version for the initial official KEDA approach
We are happy to receive the donation but given the above (primarily single vendor solution) we are going to start with a separate add-on under /kedacore and integrate it in our Helm registry to make installation easy
We'd need at least two maintainers for the new add-on to ensure continuity (does not have to be existing KEDA Core/overall maintainers)
Once Azure is supported and shipped, we can use that as example to work with other clouds/hosting providers to follow and adapt the model going forward. When we have multiple other providers, we can start looking at integrating this directly in to KEDA Core and phase-out the separate add-on

I'll circle back with @qpetraroia and discuss next steps

kedacore / keda