thegreenwebfoundation / co2.js

An npm module for accessing the green web API, and estimating the carbon emissions from using digital services
Other
385 stars 48 forks source link

Allow for constants in SWD model to be adjusted #126

Closed fershad closed 1 year ago

fershad commented 1 year ago

This PR will introduce the ability for people using the Sustainable Web Design model to adjust some of the constants that are used for estimate calculations. This is requested in issues #120 & #109, and has also come up in other conversations.

The constants which we would allow to be modified include:

Allowing developers to adjust these figures would enable more precision in estimates using the model. Since developers would likely have much of this data on hand, they can use it to override the generic constants and return estimates that are more tailored to their use case.

Todo:

fershad commented 1 year ago

Release CO2.js v0.12.0

fershad commented 1 year ago

The more I think about this, the more I feel that as a starter we should limit modifying grid intensity figures to only allow for DEVICE and DATA CENTER values to be changed. NETWORKS and PRODUCTION constants in the respective models should stay set to the global average grid intensity figures for simplicity.

fershad commented 1 year ago

Okay, so I've had a first pass at implementing an API through which developers can adjust the grid intensity, caching percentage, and first/return visitor percentages in the SWD model.

const co2 = new CO2({
      model: "swd",
      gridIntensity: {
        device: {
          value: 121,
        },
        dataCenter: {
          country: "TWN",
        },
      },
      cachePercentage: 0.5,
      firstVisitPercentage: 0.8,
      returnVisitPercentage: 0.2,
    });

Changing grid intensity As commented above, I think it only makes sense to allow for Device & Data Center grid intensity figures to be adjustable. Networks are global by nature, and so is the modern Production supply chain, so it makes sense for those to use GLOBAL_INTENSITY values.

Developers will have the option to pass through an object for both device and dataCenter with either a value (a number representing grid intensity CO2e/kWh) or a country. If a country is passed through, then we will use the Average Grid Intensity data for that country (if available).

If nothing is pass through for one or both, then we fallback to the GLOBAL_INTENSITY and RENEWABLES_INTENSITY as appropriate.

Changing caching & visitors

The cache percentage, first visitor percentage & return visitor percentage can also be adjusted by passing in the respective options. They must be a decimal number between 0 and 1. There is currently no check to ensure that first & return visitor percentage values sum to equal 1.

You can see how each of the options above looks in isolation in the very basic first run at testing them.

michael-voit commented 1 year ago

Hey Fershad,

great, that you started working on that! Hope I will have time in the coming weeks to join you, if needed.

I also thank you for the thoughts that you have put in!

I like your idea of making it possible to not set the grid intensity for everything, but DEVICE and DATA CENTER. Some questions to that idea:

That's my thoughts just from reading your posts - so the theoretical part. ;)

Hope to find time soon to jump in also practical. :)

fershad commented 1 year ago

Thanks @michael-voit. I was thinking about the networks side, and what you've said does make sense. Even when looking at an entire website, including third-party requests, there's a chance that requests served from a CDN/edge location could be in the same country as the end user device. Allowing for customisation of the networks intensity figure does make sense.

I feel that we should treat the data center intensity the same as we treat green hosting. With green hosting, we expect the developer to pass in a Boolean to indicate a data center is hosted green/not. We've got an API within the library that can be used to check against our Greencheck API, but that functionality lives outside of the main CO2 estimation API. That also keeps the estimation functions perByte and perView clear of any outbound network requests which could introduce a whole bunch of potential failure and edge cases.

We could stick with this separation of concerns track, and consider whether it makes sense to have an API within CO2.js to query IP to CO2 Intensity, just like the hosting check we have now. This way developers can choose if they want to import it into their project, and how they go about using it. In the meantime, it's probably worth having some documentation that shows folks how to pull together our APIs like IP to CO2 Intensity, and use that data in CO2.js.

fershad commented 1 year ago

As I was working on this today, I found the code getting more and more convoluted. As a result, I've reworked the approach taken, which feels like it has resulted in less complex code while still allowing the flexibility that this PR aims to provide.

Now, instead of passing the grid intensity, caching percentage, and first/return visitor percentages when declaring a new Object, developers can pass these options directly into the function. This just "feels nicer".

const co2 = new CO2({model: "swd"});

const emissions = co2.perVisit(1000, false, {
  gridIntensity: {
    device: 121,
    dataCenter: {
      country: "TWN",
    },
  },
  cachePercentage: 0.5,
  firstVisitPercentage: 0.8,
  returnVisitPercentage: 0.2,
});

The key thing to note here is that both perByte and perVisit functions now need a Boolean value for green hosting passed in. Both functions now take the following inputs (bytes, green, options).

fershad commented 1 year ago

Talked about this internally with @mrchrisadams. He raised that we should have a way for people to output a log of the changes that have been made.

Since we are exposing a way for people to divert from the standard Sustainable Web Design model, it would be practical to expose the adjustments that have been made when a result is returned. This can help give more clarity to the results that are returned, and provide an audit trail as to why results returned using adjusted constants differs from what would be expected from the Sustainable Web Design model.

To do this, we will create two new functions perByteTrace and perVisitTrace. These will return an object with the CO2 estimate, and details of the variables that were used in the calculation. In this way, we'll also be able to expose this change in the next release & let people test it in a way that does not impact on existing production implementations.

fershad commented 1 year ago

@mrchrisadams was this the kind of output you had in mind? Happy for suggestions on key names.

I've created two new functions perByteTrace and perVisitTrace which perform the respective CO2 calculations and return an object like the one below. In the below example, I have changed the dataCenter and device grid intensity, and the cache percentage. Running this through our favourite spreadsheet works out.

const co2 = new CO2();

co2.perVisitTrace(MILLION, false, {
        gridIntensity: {
          device: 565.629,
          dataCenter: { country: "TWN" },
        },
        cachePercentage: 0.6,
})

//OUTPUT

{
  co2: 0.36134643955500007,
  variables: {
    description: 'Below are the variables used to calculate this CO2 estimate.',
    gridIntensity: {
      description: 'The grid intensity (grams per kilowatt-hour) used to calculate this CO2 estimate.',
      network: 442,
      dataCenter: 565.629,
      production: 442,
      device: 565.629
    },
    cachePercentage: 0.6,
    firstVisitPercentage: 0.75,
    returnVisitPercentage: 0.25
  }
}

Aside, there might be a case to rename cachePercentage to dataReloadRatio, which is closer to how the constant is represented in the SWD formula.

fershad commented 1 year ago

Renamed cachePercentage to dataReloadRatio. I've also added some JSDoc comments and types for the new functions.

const co2 = new CO2();

co2.perVisitTrace(MILLION, false, {
        gridIntensity: {
          device: 565.629,
          dataCenter: { country: "TWN" },
        },
        dataReloadRatio: 0.6,
})

//OUTPUT

{
  co2: 0.38260211247000003,
  green: false,
  variables: {
    description: 'Below are the variables used to calculate this CO2 estimate.',
    gridIntensity: {
      description: 'The grid intensity (grams per kilowatt-hour) used to calculate this CO2 estimate.',
      network: 442,
      dataCenter: 565.629,
      production: 442,
      device: 565.629
    },
    dataReloadRatio: 0.6,
    firstVisitPercentage: 0.75,
    returnVisitPercentage: 0.25
  }
}

Note: A user could pass in dataReloadRatio, firstVisitPercentage, and/or returnVisitPercentage into the perByteTrace function, even those these variables are not used for the related carbon calculation. If this happens, they'll just be ignored & not shown in the returned result.

fershad commented 1 year ago

One thing to note with using the Average Intensity data as part of this new functionality is that the package size does increase. Doing a dry run of npm pack with the new build returns:

npm notice === Tarball Details === npm notice package size: 95.2 kB
npm notice unpacked size: 532.5 kB
npm notice total files: 98

fershad commented 1 year ago

I've updated the build scripts to exclude test files for CJS and ESM. That gets our unpacked size down to under 400 kB, and helps address #121.

npm notice === Tarball Details === npm notice package size: 77.5 kB npm notice unpacked size: 384.9 kB npm notice total files: 62

fershad commented 1 year ago

😅 The change I made resulted the 1byte model & data files not being compiled & packaged. Fixed that https://github.com/thegreenwebfoundation/co2.js/pull/126/commits/641ae847723d5e977fa6e11f529a5a62a6988f8b and we still have a slightly smaller unpacked size.

npm notice === Tarball Details ===
npm notice package size: 82.3 kB
npm notice unpacked size: 415.0 kB
npm notice total files: 77