thegreenwebfoundation / co2.js

An npm module for accessing the green web API, and estimating the carbon emissions from using digital services
https://developers.thegreenwebfoundation.org/
Other
423 stars 56 forks source link

Include model used for website carbon, ecoping, and the rest #56

Closed mrchrisadams closed 2 years ago

mrchrisadams commented 2 years ago

This model is used in a number of places:

https://sustainablewebdesign.org/calculating-digital-emissions/

There's a bit more research needed for what I was referring to as the Green Byte Model in in #2, but it would be useful to have the different models in used available, so you can choose between:

Acceptance criteria

In all cases we'd want to be able to call perbyte to return a single number for compatibility, and have a way to return an object with scope for more complex return info (so we can refer to specific parts of the SWD result breakdown or other info)

mrchrisadams commented 2 years ago

I'm not going to go into the validity of using a 'per gigabyte' model for this issue compared to other options, nor the validity of the top down figures for total internet energy usage. The main aim of this issue is capturing the intent of the Sustainable Web Design model outlined above, in code, so it's easy to use and deploy in software. If you want to propose any changes to how you model emissions that are not described, or at least alluded to in the page linked in the issue please see issue #2 in this repo

At present, if you want to get a CO2 emissions figure for data transfer we have an API like so:

You instantiate the instance of the CO2 class:


const co2Converter = CO2()

And then you call a single function like so, passing in:

From memory, the IEA figure is just for fuel - it's not a levelised figure that includes the embedded emissions in making the infrastructure as well. If it was a levelised figure, it would make sense to have non-zero figures for renewables too, as it takes energy to make solar panels, and the rest.


// define some constants that we expect to use in our examples
// 1024 bytes in a kilobyte 
// 1024 kilobytes in a megabyte 
// 1024 megabytes in a gigabyte
const GIGABYTE = 1024 * 1024 * 1024

// simulate getting a 'green' result when looking up a domain against the green web fdn dataset
const GREEN = true

// get back our CO2 figure
co2Amount = co2Converter.perByte(GIGABYTE, GREEN) 

This should return a figure for CO2 in grams - typically something between 150 and 500 grams per gigabyte transferred, depending on how dirty the grid is.

Adapting this API to the Sustainable Web model

The Sustainable Web model is conceptually similar to the 1 Byte model from the shift project.

We'd ideally be able to use it like so for simple cases:


// we don't have this handy object yet but it would be nice to have it, and not too complicated to implement
import { models } from '@tgwf/co2'

// we pass in the different model here, instead of defaulting to the 1 Byte model.
// Feel free to use a better name than SustainableWebDesign...
co2Converter = CO2({model: models.SustainableWebDesign})
)

const co2Amount = co2Converter.perByte(GIGABYTE, GREEN)

This would return the a figure in grams using the different assumptions in the Sustainable Web Design model as compared to the 1 Byte model, but assuming you have original usage data in the form of structure data like HAR files, and counts for page loads, you should be able to re-run it with the new model for updated numbers.

Key differences for the simplest version of the sustainable web design model

For the purposes of making calculations, the key differences for the 1byte model as implemented in CO2.js here, and the SWD one as as follows:

Including device energy

The one byte model as implemented in CO2js does not account for device usage, nor embodied energy. The Sustainable Web Design does attempt to include them.

For every gigabyte of data transferred, it assumes 0.81 kilowatt hours, which can be divided across the following:

Accounting for caching

The sustainable web design model also tries to account for caching. Instead of assuming no caching, the SWD model assumes

If you prefer, you can express this accordingly:

E = (Data Transfer per Visit in GB x 0.81kWh/GB x 0.75) + (Data Transfer over the Wire (GB) x 0.81kWh/GB x 0.25 x 0.02)

Extending the Sustainable Web Design Model

This by itself ought to be compatible with the website carbon figures.

Spatial resolution

Carbon intensity varies by country. If we have country level figures, (or better yet, grid level figures within a country), it would be good to use that.,

If we pass the ISO country Alpha-2 country code in, we get back the annualised carbon intensity figure for the country.

const co2Amount = co2Converter.perByte(
    GIGABYTE,
    GREEN,
    {deviceCountry: "FR" });

If we have more than one grid region in use we follow the Electricity Map example of country code, plus grid identifier.

So AUS-QLD for the Queensland grid in Australia, AUS-SA for the state of South Australia in Australia, and so on.

It might look like this:

const co2Amount = co2Converter.perByte(
    GIGABYTE,
    GREEN,
    {deviceCountry: "AUS-QLD" });

Theoretically, because we have a country namespace, we could scale this further down to ever smaller grid regions, microgrids, or perhaps even datacentres or cloud provider regions, assuming they provide trustworthy data. This would provide scope to eventually move on from the pretty simplistic green/not green boolean approach to something richer.

But for now, we assume country level resolution, and at a stretch, grid level.

Accounting for production emissions

The sustainable web model includes production emissions. If you were to do use this, given the percentage of electronics manufactured in China you might use an emissions factor for the Chinese grid, rather than the grid the device is being used. If you're using an macbook charged in France, but made in China, using the French carbon intensity figure for production emissions would be misleading.

If you were to go further down this rabbit hole, you might use a weighted average to include the Taiwanese grid to capture the volume of chips made there, and that many of the energy intensive processes for fabricating chaps might happen there.

Temporal Resolution

In addition to geographic resolution, we'd want the option of providing support for temporal resolution, although doing this involves a few deliberate judgement calls.

The SWD model includes the impact of energy usage from the consumer device, and this represents around half the calculated emissions, so we have two decisions to take here.

If we decided that the datacentre carbon intensity energy was worth capturing

If we need to capture the datacentre intensity, we'd likely need an idea of both where the device accessing content is, AND where the origin server serving the content is.

They could both have different carbon intensities for the energy - think of the case where a user is on a device connecting during daytime from their point of view, to a server on the other side of earth where it's night time.

To capture both accurately, we would need the time in both places. You could do this with:

This might look like so, if you wanted to get a number back:


const timeAndSpace = { 
    deviceCountry: "AUS-QLD", deviceTimestamp: "2021-12-13T16:40:27.614Z",
    serverCountry: "FR"
}

const co2Amount = co2Converter.perByte(
    GIGABYTE, 
    GREEN, 
    timeAndSpace
)

If we decide it's less important

In the SWD model, the datacentre emissions make up 15% of the total figure. We might decide that the difference in carbon intensity between servers isn't large enough to matter, or it's not something we have meaningful control over.

This would simply things somewhat - depending on your goal, this might be a valid decision. This is something that isn't obvious from the page:


const timeAndSpaceOnlyForDevice = { 
    deviceCountry: "AUS-QLD", deviceTimestamp: "2021-12-13T16:40:27.614Z",
}

const co2Amount = co2Converter.perByte(
    GIGABYTE, 
    GREEN, 
    timeAndSpaceOnlyForDevice
)

Figuring out appropriate time resolution the carbon intensity 'bucket'

Once you have this information, because geographic carbon intensity needs to be tied to a time period, you'd likely try to fit this into the corresponding time bucket for a tracking grid intensity. In most places, these are 30 minute settlement periods, but higher resolution does exist in some parts of the world.

You'd likely need some indication on what the time resolution is from a provider, as this could lead to different numbers -

So far the co2Converter.perByte function onlys returns a numeric figure.

Changing this return type would likely break a bunch of APIs so it, might make sense to be able to query the kind of default time resolution returned instead.

// check what the time resolution is
co2Converter.defaultTimeResolution  // i.e. 30mins

Another option would be to allow overriding this time resolution when calling the function. We'd likely need a set of symbols to capture the main time resolutions (i.e. 30mins, 15mins, 1yr, etc), that we could pass in.

const timeAndSpace = { 
    deviceCountry: "AUS-QLD", deviceTimestamp: "2021-12-13T16:40:27.614Z",
    serverCountry: "FR",
    timeResolution: "1yr"
}

const co2Amount = co2Converter.perByte(
    GIGABYTE, 
    GREEN, 
    timeAndSpaceOnlyForDevice
)

I think this, with a few hard coded constants, would capture most of the intent that I see in the model outlined at the link below.

https://sustainablewebdesign.org/calculating-digital-emissions/

mrchrisadams commented 2 years ago

hey Dryden, can you let me know if you can follow this outline here?

I haven't written tests for this yet, but hopefully it should be possible to write tests and implement a model against these numbers, in a way that ought to be compatible, and extendable.

drydenwilliams commented 2 years ago

Thanks for this @mrchrisadams great detail, I will try and have a look through properly in the next week.

mrchrisadams commented 2 years ago

Dryden's PR for this is at the link below.

I'll flesh out a few more acceptance criteria so it's clearer what it would make sense to include

https://github.com/thegreenwebfoundation/co2.js/pull/58

drydenwilliams commented 2 years ago

A few questions were raised from my PR:

mrchrisadams commented 2 years ago

These are the numbers from the website carbon API, which I understand to be older than model outlined at the digital emissions page. These are the key constants they've been using, which the new information supersedes.

const KWG_PER_GB = 1.805;
const RETURNING_VISITOR_PERCENTAGE = 0.75;
const FIRST_TIME_VIEWING_PERCENTAGE = 0.25;
const PERCENTAGE_OF_DATA_LOADED_ON_SUBSEQUENT_LOAD = 0.02;
const CARBON_PER_KWG_GRID = 475;
const CARBON_PER_KWG_RENEWABLE = 33.4;
const PERCENTAGE_OF_ENERGY_IN_DATACENTER = 0.1008;
const PERCENTAGE_OF_ENERGY_IN_TRANSMISSION_AND_END_USER = 0.8992;
const CO2_GRAMS_TO_LITRES = 0.5562;

As I understand it, Websitecarbon's averages weren't based on the HTTP Archive, but on the stored values in their own dataset (I might be wrong here - it might be worth asking Tom).

About dynamic visitor figures

Should visitor figures be dynamic? I think it should be. Currently, they are hardcoded to:

  • 0.75 returning visitors
  • 0.25 for new users
  • 0.02 for subsequent data loaded

The general principle I get from reading the SWD description is along these lines:

provide some defensible defaults in the absence of data, but support using data if it exists

Generally speaking, the approach I think we should follow is have these as constant values for now, with the option to pass overrides in if you have them.

drydenwilliams commented 2 years ago

Morning @mrchrisadams,

I asked Tom from WholeGrain and he mentioned that they used their own average which I assume a transfer size of around 2.78 MB to get 1.76 grams of CO2 (with 1.805 kWh/GB and 475 intensity)

I've added those constants in my PR, I have removed the renewable figure as I assumed the grid intensity would be dynamic from live grid intensity but realize many people wouldn't have this so would need to add it in? Or just hook this up to an API to get this?

Other values like RETURNING_VISITOR_PERCENTAGE, FIRST_TIME_VIEWING_PERCENTAGE, PERCENTAGE_OF_DATA_LOADED_ON_SUBSEQUENT_LOAD, and CARBON_PER_KWG_GRID should be dynamic. We would need to use this model on EcoPing. It could lead to different calculations everywhere? But maybe that's ok and limited based on this model? If so changing the PR to something like:

energyPerVisit(bytes, returningVisitorPct = 0.75, firstTimeVisitorPct = 0.25, subsequentLoadPct = 0.02) {
    const transferedBytesToGb = bytes / fileSize.GIGABYTE;
    return (
      transferedBytesToGb * KWH_PER_GB * returningVisitorPct +
      transferedBytesToGb *
        KWH_PER_GB *
        firstTimeVisitorPct *
        subsequentLoadPct
    );
  }

  emissionsPerVisitInGrams(energyPerVisit, globalIntensity = GLOABL_INTENSITY) {
    return formatNumber(energyPerVisit * globalIntensity);
  }

and adding a new helper function like Website Carbon API like:

 getStatistics(
    bytes,
    globalIntensity = GLOABL_INTENSITY,
    returningVisitorPct = 0.75,
    firstTimeVisitorPct = 0.25,
    subsequentLoadPct = 0.02
  ) {
    const energy = this.energyPerVisit(
      bytes,
      returningVisitorPct,
      firstTimeVisitorPct,
      subsequentLoadPct
    );
    const co2gramsPerView = this.emissionsPerVisitInGrams(energy, globalIntensity);

    return {
      bytes
      intensity: globalIntensity,
      co2grams: co2gramsPerView,
    };
  };

Note: these numeric values would come from constants

mrchrisadams commented 2 years ago

@drydenwilliams can you let me know how you're handling this when a website makes requests to multiple domains in multiple places?

We originally implemented the "per byte" approach to provide a low level way to think about data transfers, and now that I've seen the examples and PR, I think the SWD model you're using is actually working at a higher level than we were.

For example, the CO2 js approach so far was designed to allow for the case of a single website visit creating connections to multiple different domains - each of which might have different CO2 intensities, as the files are being served from places on different grids, and because you can't directly control the path data takes a it routes through the network.

The per byte approach was also originally designed to allow for more than just website visits (think of streaming videos, making large software downloads, peer to peer transfer and so on) - I'd really like to avoid losing this extra granularity, as it limits where you could use these models.

Previously we've avoided the issue of caching for this reason - on a website visit load, we didn't think we had enough information to make a defensible assumption about which resources are cached at the website visit level, and it would have to be addressed on a per request level instead.

A way to support both - splitting out the two models

With this in mind, I think it might be possible to have separate out these two ideas, so you might have a perByte method for the transfer part (containing assumptions about carbon intensity between various points), which is called by a higher level perWebsiteVisit method (containing the assumptions about device emissions, caching and the rest).

This approach would keep the model as being compatible (so folks who have data collected would be able to drop in the SWD model for transfer), whilst supporting the higher level "website visit" use case.

Carbon intensity

I realise in the original issue it was probably wasn't too clear, but the intention was that the second and third arguments would be optional, with fallbacks to global constants in the absence of either data from the provider, or input from the consumer of the model.

I'm trying to follow this general principle here:

provide some defensible defaults in the absence of data, but support using data if it exists

The second argument - green / not green

The second boolean argument would account for the green energy case as used in website carbon, sitespeed.io, and in CO2.js right now .

You might think of this second argument as covering the case of market based emissions reductions when measuring Scope 2 emissions in the GHG protocol.

At the green web fdn, we use the green / not green largely because we do recognise where people are matching fossil energy use with green energy tariffs and the rest, rather than only measuring the direct emissions.

We outline a bit more [on the green web foundation website about evidence we accept and why](), as does Microsoft, with their own reasoning about systemic change versus local change .

However, I know that Ecoping doesn't recognise these reductions, nor does the Green Software's Carbon intensity spec. Even with the GHG protocol, this is somewhat recognised by the existence of the location based emissions in Scope 2 too.

Having this as separate from from the time and location carbon intensity based info allows us to cover both cases.

Going from a boolean to something richer (like percentage of the year matched with carbon free power) might make sense in future, but for now, a boolean is API compatible and easy to implement.

TBH, this whole topic is a massive rabbithole - this piece is probably the best summary of the different points of view I've found.

https://www.volts.wtf/p/is-247-carbon-free-energy-the-right

The third argument - temporal and geographic carbon intensity

The third argument here was to support this geographical and temporal resolution information you refer to, that Ecoping uses, but as I understand it, Ecograder and Website Carbon do not. There is this issue in WebPageTest to start incorporating these figures too, but but discussions about carbon intensity data haven't come up yet:

https://github.com/WPO-Foundation/webpagetest/issues/1613

Anyway - access to this data in isn't universal and free yet, and depending on the use cases, the extra detail might not always be worth the extra effort and cost. There's a project I'm working on to get some of this data released as open data](https://github.com/Green-Software-Foundation/sci-data), but the timeframe goes beyond the scope of this issue.

In this case here, as long as you're providing the same input data, and have a clearly understood way for going from a website visit to carbon emissions in the model, I think it's okay to have different providers implement different levels of detail carbon intensity wise.

This was the thinking for this part here in the issue:


const timeAndSpace = { 
    deviceCountry: "AUS-QLD", deviceTimestamp: "2021-12-13T16:40:27.614Z",
    serverCountry: "FR"
}

const co2Amount = co2Converter.perByte(
    GIGABYTE, 
    GREEN, 
    timeAndSpace
)

Even if software implementing this model doesn't use higher resolution data yet, supporting higher resolution input as an optional argument would be worthwhile in the model .

An update path and data portability

Over time, I think more data about carbon intensity of electricity will likely become more freely available, but having something like EcoPing available now provides an API compatible upgrade path for folks who need higher resolution carbon emissions info if needed, even if they have started out with other tooling.

This is an open project, and if we want to support upgrade paths like this, it would make sense to know there are entirely OSS options exist too - in fact some orgs won't use proprietary tools unless there is OSS alternative they can migrate back out to.

If they needed to folks who need an OSS option could still deploy the sitespeed suite, and deploy many of the same scrapers that Electricity Map uses - this is what I understand Ecoping does.

Summary: separate perbyte and perVisit methods, overridable defaults, and the three arguments for perbyte

@drydenwilliams can you would you please implement the perByte method in the system as the lower level part of the model for transfer? That would provide the API compatibility, while still supporting higher level modelling like the website modelling

I'm open to discussing alternatives as long as the meet they criteria above, and the requirements for API compatibility, to support switching from one model to another one given something like a HAR file or other structured representation of a webpage visit.

mrchrisadams commented 2 years ago

I'm closing this, as we now have the SWD implemented largely with tests as outlined in the issues #67 and #58.