Introduce carbon aware scaler

kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

https://keda.sh

Apache License 2.0

8.29k stars 1.05k forks source link

Introduce carbon aware scaler #3467

Open tomkerkhove opened 2 years ago

tomkerkhove commented 2 years ago

Proposal

Provide a carbon aware scaler that allows end-users to scale based on their impact on the environment.

As per @rossf7 on https://github.com/kedacore/keda/issues/3381:

The WattTime API provides marginal carbon intensity for the configured location. This data is updated every 5 mins so is timely enough to make scheduling decisions.

percent A percentile value between 0 (minimum MOER in the last month i.e. clean) and 100 (maximum MOER in the last month i.e. dirty) representing the relative realtime marginal emissions intensity.

The percent value can be used as a trigger for scaling and is available to all users. watttime.org/api-documentation/#real-time-emissions-index

The WattTime API will also return the marginal emissions rate if the user has a Pro subscription. So optionally the scaler could also allow scaling on this value.

moer - Marginal Operating Emissions Rate (MOER) value measured in lbs/MWh. This is only available for PRO subscriptions.

For the WattTime API the user needs to specify which grid they are using. This is referred to as the BA (balancing authority) and there is an API endpoint to determine this. watttime.org/api-documentation/#determine-grid-region

To make the scaler easier to use we could define a high / medium / low style traffic lights system. So users can decide whether workloads can run at medium intensity or only at low intensity. This would also abstract the WattTime API and mean that other sources of carbon intensity data can be used.

Also:

This paper from October 2021 shows that in most regions shifting delay tolerant workloads to weekends can reduce carbon emissions by 20% and shifting to next day can reduce emissions by 5%. arxiv.org/abs/2110.13234

At the Green Web Foundation we have a Go library that is integrated with the WattTime API and could be used to fetch the grid intensity data. thegreenwebfoundation/grid-intensity-go

For this proposal we’re suggesting using the WattTime API as it has good regional coverage and a free API for relative intensity. However there are other sources of carbon intensity data such as the UK national grid (carbonintensity.org.uk) which is a public API or ElectricityMap which has a paid API.

Use-Case

Automatically scale workloads out while the impact on the environment is low, scale in if the impact is too high.

This is useful for batch-like workloads.

Anything else?

Relates to https://github.com/kedacore/keda/issues/3381 Related to our collaboration with the Environmental Sustainability TAG/WG (https://github.com/kedacore/governance/issues/59)

tomkerkhove commented 2 years ago

I'm leaning towards using the Green Web Foundation's Go SDK but open to thoughts

rootfs commented 2 years ago

I like this idea!

A couple of questions:

How does the carbon API help? If all the deployments are in the same data center/same geo, does carbon intensity change the anything in compute? It looks to me the best use case is the multi-cluster, multi data center case (see some prior study here )
Does it make sense to schedule every deployment or only selectively pick those that have a high energy consumption, thus high carbon impact ones? That brings the question of how to measure the energy consumption of deployments.

tomkerkhove commented 2 years ago

How does the carbon API help? If all the deployments are in the same data center/same geo, does carbon intensity change the anything in compute? It looks to me the best use case is the multi-cluster, multi data center case (see some prior study here )

It depends on the workload but some "secondary"/low-prio workloads can just be scaled down in a given geo if the impact on the environment is too high. This is not specifically a multi-cluster scenario.

Does it make sense to schedule every deployment or only selectively pick those that have a high energy consumption, thus high carbon impact ones? That brings the question of how to measure the energy consumption of deployments.

That's up to the end-user; the triggers are specific to a ScaledObject and thus on a per-workload basis. So it's up to you to choose what makes sense and what does not.

For example, workloads that require GPU could be scaled down while lesser consuming workloads can continue to run.

rossf7 commented 2 years ago

@tomkerkhove Thanks for adding this and improving the design. So we can implement this scaler sooner and combine with more scalers once the OR support is in place.

I'm leaning towards using the Green Web Foundation's Go SDK but open to thoughts

Of course we'd love it if you can use the SDK!

Just a heads up that we need to make some breaking changes in https://github.com/thegreenwebfoundation/grid-intensity-go/issues/44 to be able to support more providers of carbon intensity data.

I'm working on the changes and they should be done soon. So I hope they won't be disruptive.

tomkerkhove commented 2 years ago

Thanks for adding this and improving the design. So we can implement this scaler sooner and combine with more scalers once the OR support is in place.

Correct!

Just a heads up that we need to make some breaking changes in https://github.com/thegreenwebfoundation/grid-intensity-go/issues/44 to be able to support more providers of carbon intensity data.

I'm working on the changes and they should be done soon. So I hope they won't be disruptive.

Good to know, thanks for sharing! If you are contributing to the SDK; are you willing to contributing the scaler as well?

mrchrisadams commented 2 years ago

Hi @rootfs, I've been working with @ross7 on the grid intensity go SDK thing. I've tried to provide some more background to the answers

How does the carbon API help? If all the deployments are in the same data center/same geo, does carbon intensity change the anything in compute? It looks to me the best use case is the multi-cluster, multi data center case (see some prior study here )

The above example works by moving workloads geographically (as in, it moves them through space).

You can also move workloads temporally (as in move them through time).

The carbon intensity changes based on the time of day, so the same workload run at different times will have different emissions figures.

The issue referred to one paper titled Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud, and it's a fun read, going into this in more detail.

Last month at the recent ACM SIGEnergy workshop, there was a talk from some folks at VMware sharing some new findings, called Breaking the Barriers of Stranded Energy through Multi-cloud and Federated Data Centers. It's really worth a watch but this quote from the abstract gives an idea of why the time element is worth being able to act upon:

many computation workloads (such as some learning or big data) can be flexible in time (scheduled for delayed execution) and space (transferred across any geographical distance with limited cost). This opens the possibility of shifting workloads in time and space to take advantage in real time of any amount of excess renewable energy, which otherwise would be curtailed and wasted. Initial results show that a single datacenter that time shifts load can reduce its emissions by 19% or more annually

There's also some work by Facebook/Meta, where they have shared some results from using this same carbon aware workload scheduling as part of their sustainabilty strategy - see their recent carbon explorer repo. I think they might use their own scheduler, rather than Kubernetes, but the principle is the same - move work through space to make the most of cheaper green energy for your compute.

Does it make sense to schedule every deployment or only selectively pick those that have a high energy consumption, thus high carbon impact ones? That brings the question of how to measure the energy consumption of deployments.

For the suitability question, that's down to the person running the cluster, and the job. Some jobs are better fits for moving through time (low latency, pause-able jobs), and some jobs better for moving through space (ones that don't have to be run within a specific jurisdiction). These are somewhat independent of the energy consumption. If you're curious about the the energy consumption part, I think Scaphandre provides some numbers you can use and labelling of jobs for k8s, and this piece here from the BBC gives an example of it in use.

Hope that helps!

rossf7 commented 2 years ago

If you are contributing to the SDK; are you willing to contributing the scaler as well?

@tomkerkhove Yes definitely, I'd like to contribute the scaler. We need to finish up the SDK changes and some other dev but I should be able to start on this later in the month.

tomkerkhove commented 2 years ago

After discussing with @vaughanknight & @yelghali I've noticed that my proposal for just having a trigger does not make much sense because it will scale straight from min to max replicas given the emission does not change that often.

Instead, I'm wondering if we should not make this part of the ScaledObject/ScaledJob definition as a whole similar how we handle fallback:

Imagine the following:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: {scaled-object-name}
spec:
  scaleTargetRef:
    name:          {name-of-target-resource}         # Mandatory. Must be in the same namespace as the ScaledObject
  maxReplicaCount:  100                              # Optional. Default: 100
  environmentalImpact:
    carbon:
    - measuredEmission: 5%
      allowedMaxReplicaCount: 50
    - measuredEmission: 10%
      allowedMaxReplicaCount: 10
  fallback:                                          # Optional. Section to specify fallback options
    failureThreshold: 3                              # Mandatory if fallback section is included
    replicas: 6                                      # Mandatory if fallback section is included
  triggers:
  # {list of triggers to activate scaling of the target resource}

This allows end-users to define how their application should scale based on its needs by defining triggers. If we have to control how it should adapt based on the carbon emission, then the can define measuredEmission and its corresponding allowedMaxReplicaCount.

So if the emission is 5%, then the maximum replicas of 100 is overruled to 50 and:

An event is tracked on the ScaledObject
HPA is updated to use 50 as max replica
CloudEvent is emitted

If the emission is lower than 5%, then it will go back to 100 max replicas.

Any thoughts on this @rossf7 / @zroubalik / @jorturfer?

tomkerkhove commented 2 years ago

Should we do this instead of a carbon aware scaler? No. But I think that one only makes sense once we do https://github.com/kedacore/keda/issues/3567 and with the above proposal we don't need a trigger for it anymore.

yelghali commented 2 years ago

I think it would make sens to have both features one small point about "Carbon Awareness":

The Info provided by the APIs (WattTime, ElectricityMap, etc.) is Electricity Carbon Intensity (I) in gCO2eq/KHw. e.g how much carbon is contained in the Electricity / Power we are using now ; and it is an external metric (independent of the workloads themselves)
Energy or power consumed by a workload (or a pod) (E) in Khw
Carbon emissions of the pod = *(E I) in gCO2eq**

A proposal for using both the "Core Carbon Awareness proposed above" and the "Carbon Aware Trigger"

the core Awareness feature: is about"Electricity Carbon Intensity" and would control the MaxReplicaCounts --> how far we scale to, depending on the external envirionment / Electricity carbon intensity (I) c.f the proposal you suggested @tomkerkhove
"the Carbon Aware Scaler / trigger", would be a "proper" scaler and would scale replicas based on pod power (E): e.g scale by 1 when average power of pod, is greater than x
- later the "Carbon aware scaler" could also scale (In or OUT) based on pod or workload carbon emissions (gCO2eq) as it would be another internal metric
- c.f https://github.com/intel/platform-aware-scheduling/tree/master/telemetry-aware-scheduling/docs/power#5-create-horizontal-autoscaler-for-power --> in this project, power, heat are considered as other metrics (triggers) CPU, Ram, etc.

In terms of adoption, I think the "Core Carbon Awareness" is simpler to adopt because it does not require the customers / companies to have "power telemetry available" (which only a few customers have, as of now).

On the other hand "Carbon Aware Scaler" is also interesting because it offers actual Power / Carbon Metrics for the workloads. and It would fit with the AND / OR logic with other scalers.

ps: a suggestion for the fields / usage for the "Core awareness feature"

rename measuredemissions to electricity_intensity_gco2_per_kwh : as i think "measured emissions" would be related to carbon emissions (gCO2eq)
for the value, I think user should have to option to set a value in addition to the % : because from what i've seen the APIs provide a number for the CarbonIntensity (e.g 34) ; using the % is more interesting but i think it takes more work to implement.

carbon:
- electricity_intensity_gco2_per_kwh: ; for ex. 420 allowedMaxReplicaCount: 50

tomkerkhove commented 2 years ago

"the Carbon Aware Scaler / trigger", would be a "proper" scaler and would scale replicas based on pod power (E): e.g scale by 1 when average power of pod, is greater than x

later the "Carbon aware scaler" could also scale (In or OUT) based on pod or workload carbon emissions (gCO2eq) as it would be another internal metric

c.f https://github.com/intel/platform-aware-scheduling/tree/master/telemetry-aware-scheduling/docs/power#5-create-horizontal-autoscaler-for-power --> in this project, power, heat are considered as other metrics (triggers) CPU, Ram, etc.

We can add this but before we start building scalers we'd need to be sure what they look like though as once a scaler is added we can't simply introduce breaking changes.

However, if my above proposal is agreed on then we can open a separate issue for it.

ps: a suggestion for the fields / usage for the "Core awareness feature"

rename measuredemissions to electricity_intensity_gco2_per_kwh : as i think "measured emissions" would be related to carbon emissions (gCO2eq)

I think this is something we can document as details though, no need to be that verbose IMO. We can rename it to measuredIntensity though.

yelghali commented 2 years ago

agreed, the scaler can be the next step. the proposal above has value and would be easy to build

clemlesne commented 1 year ago

Imagine the following:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: {scaled-object-name}
spec:
  scaleTargetRef:
    name:          {name-of-target-resource}         # Mandatory. Must be in the same namespace as the ScaledObject
  maxReplicaCount:  100                              # Optional. Default: 100
  environmentalImpact:
    carbon:
    - measuredEmission: 5%
      allowedMaxReplicaCount: 50
    - measuredEmission: 10%
      allowedMaxReplicaCount: 10
  fallback:                                          # Optional. Section to specify fallback options
    failureThreshold: 3                              # Mandatory if fallback section is included
    replicas: 6                                      # Mandatory if fallback section is included
  triggers:
  # {list of triggers to activate scaling of the target resource}
This allows end-users to define how their application should scale based on its needs by defining triggers. If we have to control how it should adapt based on the carbon emission, then the can define measuredEmission and its corresponding allowedMaxReplicaCount.

So if the emission is 5%, then the maximum replicas of 100 is overruled to 50 and:
* An event is tracked on the ScaledObject
* HPA is updated to use 50 as max replica
* CloudEvent is emitted
If the emission is lower than 5%, then it will go back to 100 max replicas.

We need to keep in mind the KEDA users are exposing their services to end-users. The end-user, at the end, wants quality of service (shareholders too). We can justify a lower quality of service for a certain period of time, but the service needs to be usable. So, limiting the number of replicas to a fixed value does not seem appropriate to me at all.

It would seem more relevant to me to apply a relative decline to the scaling rule, not in Absolute replica count (Pod) but in Relative replica count (%).

Imagine the following:

In that example:

The quality of service is maintained, as the scaled resource can scale indefinitely is the triggers requires it.
The carbon impact is limited, as the resource consumption is reduced in the period the carbon impact is high.

spec:
  ...
  environmentalImpact:
    carbon:
      - measuredIntensity: 400
        reducedReplicaPercent: 50%
      - measuredIntensity: 200
        reducedReplicaPercent: 25%
      - measuredIntensity: 50
        reducedReplicaPercent: 10%
  triggers:
    ...

tomkerkhove commented 1 year ago

A proposal to donate AKS's carbon aware operator is open on https://github.com/kedacore/keda/issues/4463