kansoapp / cloud-aws-carbon-model

This repository consolidates all first-hand data we retrieved from the "real world" to feed our carbon estimation model(s).
5 stars 2 forks source link

🌳 Kanso Carbon Model

This repository contains the source code of the Kanso Carbon Model whose goal is to estimate the carbon footprint associated with a given company's cloud infrastructure. In this readme file we tried to spell out as clearly as possible the methodology we built to approach the estimation.

📖 Summary


📌 Important disclaimer

Tens of hours were spent to gather the scientific sources we needed to build this methodology. Yet hundreds of hours would have possibly been needed to bring a more accurate method! One of our core principles at Kanso is that understanding precedes action so we favoured to build a model which is super easy to deploy within companies to maximize the spreading of the word and the understanding.

We have dedicated a part of this readme to the Procedure to suggest changes to enrich the methodology to ensure the update process is clear and easy for people willing to contribute. Additionally records of changes are being kept in this same part.

🧮 Modeling attempt on the carbon footprint related to the consumption of data center's services

For the sake of our estimation, we consider that the carbon footprint related to public cloud's services can be further subdivided in four different carbon footprints:

  1. Emissions from consuming public cloud's services
  2. Emissions from manufacturing IT equipments and air conditioners
  3. Emissions from transferring data from a data center to other data centers and to the internet
  4. Embodied emissions from building the data center

1. Emissions from consuming public cloud's services

Our model is limited to emissions due to IT equipments, air conditioning and electrical losses and is subdivided as emissions from:

with 1 = A + B + C + D

A. Running compute primitives (EC2)

This source is among the major sources of carbon emissions then we chose to have an approach based on several methods. As of today:

A.1 Running compute primitives (EC2) - The Teads & D. Guyon's approach

We've named this first approach after the work conducted by Teads' engineering team in their article Estimating AWS EC2 Instances Power Consumption (see [1]) and David Guyon in his scientific research document Supporting energy-awareness for cloud users. Networking and Internet Architecture (see [2]). We calculate emissions due to the run of compute primitives with the following formula:

A 1

We have determined the Share of the CPU in the energy consumption of the Physical Machine out of David Guyon's work. The distribution of the energy consumption page 33 highlights that 43% of the energy consumed by a physical machine is linked to the CPU. David Guyon cites D. Kliazovich, P. Bouvry, and S. U. Khan, “GreenCloud: A Packet-level Simulator of Energyaware Cloud Computing Data Centers,” The Journal of Supercomputing, vol. 62, no. 3, pp. 1263–1283, 2012 which was published in 2012. This scientific base is quite old yet:

Then the rest of the calculation stands as following:

image

With:

A.2 Running compute primitives (EC2) - The NRDC's approach

We have named this second approach after the report published by the Natural Resources Defense Council (NRDC) in 2014 called Data Center Efficiency Assessment. A part of their study is focused on the level of utilization of IT equipment in data centers and we could find the following information: for hyper-scale data centers, the server power at average utilization level stands at 101 watts.

Then, we calculate A.2 as below:

image

With:

B. Running storage primitives (EBS, S3, ...)

To estimate the energy consumed by storage primitives, we used the research work conducted by the Etsy team (see [5]) which estimated how much energy it takes to store a terabyte of data on HDD (hard disk drive) or SSD (solid-state drive) disks in a cloud computing environment:

We are interested in the order of magnitude so we estimate the emissions due to running the storage primitives as follows: image With:

C. Running IT room network devices

We used the work from David Guyon (see [2]) to estimate the energy consumed by the network devices located in the IT room. More specifically, we based our model on the following distribution of the energy consumed at the IT room level:

We approached the energy consumed by the physical machines by estimating A and B so we can determine the emissions implied by running the IT room network devices with the following formula:

image

N.B: Discussions with experts pointed out that the energy split has evolved towards 85%/15% or even 90%/10%. Yet as we didn't find any written source backed by a scientific approach, we chose to keep the 70%/30% ratio.

D. Air conditioning and electrical losses

Finally, thanks to the Power Usage Effectiveness (PUE) of the considered data centers, we can determine the CO2eq emitted because of the air conditioning and the electrical losses with the following formula: image

2. Emissions from manufacturing IT equipments and air conditioners

To our knowledge, the best approach to estimate embodied emissions from manufacturing equipments is to use the ratio: image

This ratio that we call x is sometimes documented and comparable between different equipments. We deep dived into two research papers (see [7] and [8]) that highlight that "80% of energy and carbon came from the operational phase of a computer, which can be assumed broadly equivalent to a server, with the remaining 20% from pre-use and decommissioning".

Then we chose to calculate emissions from manufacturing IT equipments and air conditioners with x=0.25 (doing so we assume that the x ratio is also valid for air conditioners).

3. Emissions from transferring data from a data center to other data centers and to the internet

We used data from an article published by George Kamiya on the IEA's website (see [9]) pointing out that we can use the [0.025 kWh/GB; 0.23 kWh/GB] range for our Data Transmission Energy Intensity.

Hence we obtain the following formula to estimate the emissions that occur when transferring data between data centers and between a data center and the internet: image

With:

4. Embodied emissions from building the data center

Approaching the emissions embodied from building the data center is complex as data centers construction processes are very different from a technology to another. Hence we chose to approach it using the x ratio we already used in the part 2. To do so we used the work conducted conducted by Beth Whitehead and Deborah Andrews in March 2015 (see [8]) stating that:

operational figures for a standard 50-year building life cycle yielded values of between 70 and 80% of the overall impact.

We assumed that the embodied emissions from building the data center counted for 20-30% of the total life cycle assessment so we worked with the following formula:

image

🙏 Procedure to suggest changes and to enrich the methodology

As stated before, we'll be glad to enrich this model every time a more accurate data point or a more recent approach is brought to our attention. If you want to contribute or just say hello, please drop us a note by email or by directly suggesting your changes through Github.

🧬 Scientific sources this methodology is based on

Below is the list of the different sources that are currently used by the Kanso Carbon Model: