Open mrchrisadams opened 3 years ago
Update on this one.
It turns out that the watttime marginal intensity index is a really good fit for this use case and is free to use.
It gives you a score out of a hundred compared to the last two weeks of carbon intensity for the location you pass along, in terms of lat-lng figures, and updated every 30mins.
that means there’s no need to figure out the stats yourself, and for us, because we are looking at scheduling forwards, this is ok.
It wouldn’t give grams of co2 per KWh figures in absolute terms, and to do that, you’d need to use a different API and different data for it anyway.
(I’ll update why when I’m not on the u-bahn)
But this feels like progress!
I've put together a sample notebook demonstrating how to get numbers back from that Watttime Marginal Operating Emissions API.
https://nextjournal.com/greenweb/experiments-with-the-free-marginal-carbon-intensity-from-wattime
It's fairly simple to take a score from 100 and divide it into three, or five buckets.
We'd need to be clear that this is a relative range for the same location, so only useful for moving code through time, not through space.
To know whether it would make sense to move code through space, you'd need some way to make comparisons between locations - a low figure in Poland, might still be higher than a high figure in France or Norway for example.
Also, because these are consequential figures, not attributional figures, if you were trying to total up carbon emissions from compute so you can record scoped emissions as part of an organisation wide carbon footprint exercise, ven if you had the absolute figures these would not be directly compatible.
This makes me think you wouldn't be able to use these in cloud carbon footprint too, as I understand that to be based on attributional model, not consequential model.
One of the key ideas behind this and the javascript grid intensity libraries was to take all the complexity around grid emissions, and try to reduce it down to a easy to grok, low resolution, actionable metric that grants people a sense of agency when thinking about the environmental impact of using energy.
You can trace a line back to this piece of work more than 11 years ago when I was first learning to code, and I discovered there was an underlying materiality to the power we use.
I blogged about it back then, and it might be some useful context:
https://blog.chrisadams.me.uk/posts-output/2009-04-24-tea-arduino-and-dynamic-demand/
And in general, if there's thing users ought to be able to do as a result of using this library is positively influence the environmental impact of any compute work, by running them in different regions, or run them at different times of day.
The thing is if we want to make this freely available, we have two options, because this information is quite difficult to get hold of, or complicated to do so.
1. Scrape the open data ourselves
This feels like poor option, as it replicates the work of existing nonprofits, and groups doing good work, like Electricity Map, or experiments like the UK national grid carbon intensity.
2. Provide a way to work with the above groups, but try to expose data at low resolution
Serve some default bands that share meaningful signal to developers, so someone without much prior knowledge can access this and make some impactful decisions based on it.
There are two reasons for relying on the low res data:
Figuring out what thresholds make sense:
I think it's likely that we'll need to make some human judgements when choosing thresholds, and that having global thresholds would be ideal first.
If the thresholds are too coarse across the globe, then you can see no benefit from shifting work loads through time or space, using the 3 high, medium low buckets, or 5 buckets (i.e. v. high, high, med, low, v. low).
If you have per-country threshholds it might be more useful, but it makes it very difficult to make comparisons across countries. For example, if you don't know that medium CO2 intensity in Poland is equivalent to very high emissions in France, then you miss out on information you might otherwise act-upon - you'd just see medium for France and medium for Poland and not know there were meaningful changes in terms of moving work around.
These feel like they'd need to be defaults, rather than set in stone, because I think the most common use case will be time shifting work, rather than geographically shifting work. If you aren't ever going to use more than one region, it's less useful to that another one is less carbon intensive than your one.
As an aside, the more research I read, the more I understand that sending data from data centre to datacentre, as long as they have decent connectivity is carbon efficient, and there isn't much penalty for doing so. I'm basing this on this study I read from University of Zurich Department of Informatics Informatics and Sustainability Research