httpwg / admin

When you want to speak to the manager.
10 stars 15 forks source link

Communicating carbon emissions #52

Open mnot opened 1 year ago

mnot commented 1 year ago

See, eg, https://datatracker.ietf.org/doc/draft-martin-http-carbon-emissions-scope-2/

gregw commented 1 year ago

This is the response i got from a sustainability consultant friend (Ken Lunty), with some links to further reading:

Hi Greg, an interesting idea and increasingly more relevant as organizations commit to disclosing their scope 3 emissions (which are scope 1 &2 emissions in their supply chain) also likely to become more and more significant with the take up of AI.

In terms of energy mix, I'm not sure the best way but typically country emissions factors are publicly available. In Australia, this would be in the National greenhouse accounts methods and factors workbook, which gets updated every year to account for increased renewables in our grid

Electricity emissions are broken down to the state level.

don't know much about the IT sector but boundary of the service would be the more challenging piece and this would tie into a consistent methodology for everyone.

Some resources would be the GHG protocol website but probably more specifically, I would havea look at the Environdec website. https://www.environdec/. com/home

This provide environmental declarations for many products and services.... we have applied this on building materials for the construction sector but it can be applied to any product...you will even see tomatos and pasta on there. It would be interesting to see if it can be applied to a service such as web hosting. The way it works is that you get the product category rules (PCR) approved which defines the methodology for all products within the category for consistency. This is underpinned by the standard for life cycle assessment. Most importantly it defines the scope and boundary of the assessment and also the functional unit. For example, for concrete this would be per m3 or ton....

Things can get more complicated in terms of functional units, for a website would it be per minute browsed? Or per MB transferred? That's probably more your area...

As an example, we are using this framework for roads at the moment and the functional unit is m2/year of design life to account for maintenance. Forwarded It gets really interesting when people can compare products and visualize the arbon reduction through choice. Hence as important as the calculation precision. Attached is an example of what we have done for a shared user path as part of a major road project

The pavement on the left is the client reference design, the one on the right is our alternative with 70% lower carbon as it has been designed for purpose rather than tradition.

So the visualization engages the engineers to innovate...which is the best part!

Sorry for all the messages, just had a bit of time on the train..I actually just did a podcast with roads Australia about the above and believe it can be applied to anything..just requires sustainability professionals to get out of their ivory tower and work with the people who can make the change...

https://www.linkedin.com/posts/ken-lunty-putting-decarbonisation-at-the-centre-of-activity-7051 4385880051 54816-hZOe ?utm_source=share&utm_medium= member_ios

Looking forward to some great discussion on this! !

IMG-20230411-WA0001

ioggstream commented 1 year ago

While I support decarbonisation, I think the WG should probably engage with subject matter experts (in carbon emission and more generally in environmental policies) to assess the effectivity of this kind of solutions.

All this, provided that the published data are trustworthy.

gregw commented 1 year ago

Some background reading:

bertysentry commented 1 year ago

Some more background reading from the Green Software Foundation (Linux Foundation):

bertysentry commented 1 year ago

Some context around this idea

Note Reminder: From the definition of scope 3 carbon emissions, when calculating upstream Scope 3 carbon emissions, you must include the Scope 2 emissions from your suppliers and providers.

In IT, we will need to measure the Scope 3 emissions of:

That means we must include the Scope 2 emissions from any 3rd-party online service we rely on.

Some examples:

We note that most of online providers provide their services through HTTP (which represents approx. 90% of the Internet traffic).

Why would we want to assess our Scope 3 emissions (and thus know the Scope 2 emissions of our 3rd-party suppliers)? Because legal pressure is building up on organizations to do so. And because this allows users and organizations to understand what is responsible for how much CO2 emissions, and change their behavior (human or program) if necessary to reduce their carbon emissions.

Here is a summary of the regulatory obligations for companies to disclose their Scope 3 carbon emissions, worldwide:

Europe: Since 2014, large companies listed on European Union stock exchanges are required to disclose information about their environmental strategy, including their greenhouse gas (GHG) emissions related to their activities (including Scope 1, 2, and 3 emissions), in their annual report. Additionally, the EU Non-Financial Reporting Directive, adopted in 2014 and revised in 2018, requires large companies to disclose information about their policies, risks, and environmental and social impacts, including GHG emissions, in a separate report.

United States: There is no federal law in the United States that requires companies to disclose their Scope 3 carbon emissions. However, some state regulations may require the disclosure of information about GHG emissions, including Scope 3 emissions (California and the Silicon Valley is a notable example of that).

Asia: Regulations vary widely across Asian countries. For example, China requires companies to publish sustainability reports that include information about GHG emissions, including Scope 3 emissions. In Japan, companies listed on stock exchanges must disclose information about their GHG emissions, including Scope 1, 2, and 3 emissions, in their annual report.

Warning This list is not exhaustive and regulations may have changed recently. Companies may also be incentivized to voluntarily disclose information about their Scope 3 carbon emissions in response to growing pressure from consumers, investors, and other stakeholders.

bertysentry commented 1 year ago

g vs J

Question: Should we report the CO2 emissions (in grams) associated to processing an HTTP request and building its response, or should we report its energy usage (in Joules)?

Pro g (grams) arguments:

Pro J (Joules) arguments:

Proposed conclusion

We probably want to expose both information: emissions in grams of CO2-eq, and energy usage in Joules.

As regulations require reporting carbon emissions in grams, this metric should be in a dedicated HTTP response header.

Energy usage in Joules being useful to developers only, this metric could be exposed through a Server-Timing header.

gregw commented 1 year ago

See below:

On Tue, 18 Apr 2023, 12:43 Bertrand Martin, @.***> wrote:

[... Snip ... ]

We probably want to expose both information: emissions in grams of CO2-eq, and energy usage in Joules.

As regulations require reporting carbon emissions in grams, this metric should be in a dedicated HTTP response header.

Energy usage in Joules being useful to developers only, this metric could be exposed through a Server-Timing header.

— Reply to this email directly, view it on GitHub https://github.com/httpwg/admin/issues/52#issuecomment-1512856130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLI5V4IGXGBDVQBAHD3XBZV6ZANCNFSM6AAAAAAW3A6U2A . You are receiving this because you commented.

Bertrand,

can you address why you think these measurements must be done per response?

It's already grown from just grams, to joules and grams. Eventually it will be scope 2 and scope 3, so double the bandwidth used again. I'm concerned at the per response data.

Why not well-known resource(s) that can report per connection, per session, per user, per time, idle, max, min, average, median, etc. ?

Message ID: @.***>

bertysentry commented 1 year ago

Is HTTP Response the right layer to expose carbon emissions?

@gregw asks this legitimate question, along with concerns about the extra bandwidth this new header requires.

Introduction on carbon emissions in IT

Typical IT departments consider 3 sources of carbon emissions: data centers, terminals, and 3rd-party services (cloud and SaaS):

image

Each area has its specificities and its own tools (or lack of):

Too many carbon footprint tools are simple calculators: how many servers you have, how many VMs, how many containers you run, etc. and you get a rough approximation of the carbon footprint of your infrastructure (disclosure: my company writes a software that measures electricity usage and carbon emissions of physical systems in data centers).

Getting actual carbon emissions values is however key to proper reporting and optimization. You cannot see whether your attempts at optimizing a software (or an architecture) really pays off if the carbon emissions are calculated or estimated, instead of measured.

IPv6

Some people think it should be integrated into a lower level (like IPv6).

I agree that exposing carbon emissions associated to the transport of data by the network infrastructure should be done at the network layer, so IPv6.

However, the network layer is not the right place to expose application-related metrics, like carbon emissions, because the energy required to perform a task (compute, memory, and storage) is going to be evaluated to the process level (in the operating system), which must know nothing about the network transport layer.

Service/resource level

Other think it should be exposed by the service provider, as a "general" information about its services, like a co2.json file placed at the root of the HTTP server, like in the fictitious example below:

{
  "/images/*": "0.00005748",
  "/rest/fictitious": {
    "GET": "0.0000046",
    "POST": "0.00245"
  }
}

Clients would query /co2.json from time to time, and aggregate these values according to the requests that have been sent to the HTTP server.

The limitation is that it allows clients to calculate their Scope 3 carbon emissions based on estimates, which doesn't encourage actual optimization on the client side. If Google announces in their /co2.json file that a query to /search averages 0.2 grams, it doesn't tell you if using double quotes in your query makes it more efficient, etc. So as a developer that relies on Google searches in its app (again, this is fictitious), I have no incentive on optimizing my queries. I will just limit the number of queries, but maybe make them more complex, which will be counter-productive.

HTTP Response

Getting the actual carbon emissions associated to the production of an HTTP response after an HTTP request is arguably the easiest way for developers, client applications and IT services to measure the carbon emissions associated to the consumption of 3rd-party services, because it will reflect their actual usage of the services, and not based on estimations.

The value exposed in the HTTP response will not include the carbon emissions associated to the network transport, which is a separate concern (and may be addressed in a separate RFC for IPv6).

First implementations of the HTTP header may actually use static estimations (as in the above example with 0.2 grams per Google search. But improvements in energy usage observability and carbon intensity measures will allow future implementations to provide actual values.

Extraneous bandwidth usage

Isn't it counter-productive to transport more data to report carbon emissions? Yes, it could in very specific cases, or really bad implementations.

However please note that this draft does not mandate that all HTTP responses produce the header. An HTTP server could choose to report the aggregated sum of the carbon emissions of the last 10 requests, and skip the header for these 9 previous requests (for the same client, obviously). Or it could skip the header when the value is not significant. This could help optimize the bandwidth usage, but at the expense of precision and optimization.

So, my personal opinion is that this extra HTTP response header is very thin in terms of bandwidth usage, compared to the average HTTP response header length (700 bytes on average, according to the HTTP Archive... according to ChatGPT 😅).

Obviously, the syntax of the header could be changed to improve that, like:

CO2: scope2=0.0004573
gregw commented 1 year ago

@bertysentry Thanks for the extra information.

However, I think you are selling short the capabilities of a Service/resource level. I agree that if it was just a /co2.json resource containing the general figures for the server, then it would have the limitations that you describe.

However, it is entirely possible to design a service that has equivalent resolution to a per response when necessary, so that requests can be tuned, but avoids that expense when not necessary.

I'm thinking of a /CO2/* resource space, similar to the /proc/* space on a linux system, that allows both general and specific requests. So perhaps:

Something like this style will give equal resolution to a per response header, so an individual response can be identified, but it also gives various useful aggregates and summaries.

There is also the possibility of a combined solution, with a /CO2/* space service being used to control which responses have a co2 header applied and what details are included in that.

The up side of this include:

The down sides of this (that I can think of) are:

bertysentry commented 1 year ago

@gregw Thank you for the feedback!

The /co2/* namespace would be a great service to users and application developers. However, we cannot expect all HTTP servers to implement and maintain such a complex service. This would typically be implemented as a reverse or forward proxy, like below:

  1. An enterprise forward proxy would collect and aggregate the Carbon-Emissions-Scope-2 header from all responses it processes.
  2. The proxy would expose the collected and aggregated data in /co2/* as you described. Aggregation could even be done by user groups, etc. if this proxy requires authentication.
  3. This would allow large organizations to assess their Scope 3 carbon emissions, associated to the use of external Web services.
gregw commented 1 year ago

However, we cannot expect all HTTP servers to implement and maintain such a complex service.

I agree that the full /co2/ as I've described is a little complex. However, I think that it would be wise, not matter what proposal goes forward, to have several levels of compliance. So for example, the minimal compliance for a `/co2/space might just be to have the non-dynamic/CO2/summary.json` resource.

But ultimately, the space is no where as near as complex as collecting the data in the first place. I would expect good open source implementations to be quickly made available..... ultimately the space would really just be implemented by something that parses the request log. So to progress this area, getting standardisation of how emissions might be logged in standard format request logs would be a good idea.

I also like the proxy idea. So it would be great to specify both a per response header(s) and a /CO2/* in a compatible way. Thus there would flexibility in solutions that could include:

However, thinking about the implementation of the last mode made me realise a difficult with the CES2 response headers: that is at the time the response headers are committed for a large response, it is unlikely that the emissions will be known if full. Thus any proposal should specify that the header can be carried in a trailer..... or just use the /CO2/* space:)

ahengst commented 1 year ago

Greetings / intro: Hi My name is Andreas Hengst, worked decades in IT operations (most recently data storage & backups). I have rudimentary network knowledge, i.e. enough to configure and troubleshoot, not enough to design protocols. I am very interested in energy and carbon issues, and fortunate to have encountered @bertysentry 's presentation at GrafanaCon recently, which is how I then learned of this thread and the draft RFC. At present I'm monitoring the behaviour of an experimental heating system involving my own 2500L thermal battery to allow a heat pump to run at the "best" time, not when the heat demand is high. So I've had experience using network devices of ALL sizes in terms of their energy consumption.

I'm not an official participant in any workgroups ("httpwg") so may have to keep quiet after this post (?). I intend to do more background study of the network technical details mentioned above. Until then I may not appreciate how rigorous or experimental or fluid the present draft discussion is.

I do have ideas and thought experiments, I'll share two and I hope they can be of some use.

1) devices don't know where "our company" starts and ends, and rely on both internal and externally hosted services, so cannot (should not?) be expected to accurately distinguish the different scopes, without requiring and maintaining some sort of IP address range database, or deliberate router configs, complicating management and introducing errors. (Might the RFC define router config settings that can increment "scope hops" as traffic crosses company boundaries?) 1b) if scope is applied to business transactions between companies, should each company just tally their CO2, then share the relevant data as part of the billing-payment process (rather than network)?

2) energy use all by itself is a good thing to measure. Any green energy not consumed means (usually) an equivalent reduction of fossil electricity created somewhere else. In other words if the only thing the end user sees is "why so much accumulated electricity for this result?" there's enough knowledge to perhaps dig deeper (make something more efficient) or to choose a more efficient alternative service. Keeping complexity to a minimum, but still achieving improved energy use. Multiple easy-to-operate protocols working together might be better able to reliably translate kWh or J to CO2e.

Question - disk arrays with FC connections (not S3 / http) might not have a way to embed energy usage data without other complimentary protocols - which is why I previously had pictured a lower (more physical layer) network protocol for this sort of tallying up of energy, something at the physical layer maybe. Was this ruled out for some reason?

Enough armchair brainstorming for me. I look forward to learning about how this data would be spawned, tallied up, reported or visualized, secured, protected from inappropriate manipulation, etc etc. Fascinating stuff, so much potential!

If you have suggestions for topic areas or homework I'm open to that (thanks!). I see the six links above (April 2023) and have some reading to do. Thanks AH Edmonton, AB, Canada

bertysentry commented 1 year ago

The draft is about to expire. So this won't have a chance of ever being adopted? I think it's shame, given the challenges we're facing in observability and sustainability.

mnot commented 1 year ago

Based on discussion so far, I'd say that there's interest in the topic, but there's skepticism about the proposed mechanism - it may not be the best way to meet your goals.

bertysentry commented 1 year ago

Should I update the draft to take into account some of the comments, or should we organize a sort of discussion on this? I really don't know how you guys usually proceed.

mnot commented 1 year ago

If you'd like to update it, that would help move discussion forward. Generally, we adopt things that have strong support in the WG -- especially by implementers.

ahengst commented 1 year ago

Hi again, just a quick hello. Nothing like "expiring soon" to reinvigorate interest! Happy to meet on zoom. To rephrase my comments July 3...

Question about RFCs... if observability-of-energy depends on additional components (e.g. front end visualization and device-level measurements) could the protocol specification stand on its own or do we need/want to develop the 'full stack'? I presume it would help to do so...

Andy

mnot commented 1 year ago

We generally stick to protocol details in RFCs.

gregw commented 12 months ago

On Wed, 4 Oct 2023 at 08:40, Mark Nottingham @.***> wrote:

We generally stick to protocol details in RFCs.

I question if the HTTP protocol is the right place for such a mechanism. As the implementer of a HTTP application server, I have very little information about the resources consumed preparing a response to a request. I can measure the thread used to dispatch the initial request handling, but I have no visibility of any further resources that are used by that thread (asynchronous processes, remote web services, databases, etc.).

The hard part about any such proposal is not adding some extra header, but of finding out the data to send. To make this a protocol feature means that you also need to invent a standard API between the server and the application. Probably many to go with all different languages and frameworks.

Thus, I again advocate to consider a solution of using a well known URI for such matters. This could be implemented as an application component, so it would already be in the space where more information about resource usage might be available.

Finally, I'm very dubious that any of our large scale users would be interested in another header per response. It is hard to imagine that any savings resulting from better CO2 accounting could offset the increased CO2 footprint trillions of extra headers sent every day forever more.

So I'm supportive of finding a solution, I just do not think your proposal is the right one.

regards

-- Greg Wilkins @.***> CTO http://webtide.com

bertysentry commented 12 months ago

About the difficulty to measure the energy and carbon footprint of an HTTP Response

Many responders to the original draft in the mailing list and on this GitHub issue mentioned it's difficult to get the metric in the first place (either in Joules, or in gCO2eq).

It is true the technology is still in its infancy, but we're making a lot of progress. Just to give a few examples:

Others in this discussion were concerned that the values wouldn't be precise enough to be of any use when assessing the carbon footprint of the usage of a given service, served over HTTP.

Most (if not all) organizations assess their carbon emissions with rough estimations, calculated once a year at best. Given the current state of carbon footprint/emissions reporting, adding a little more information, even if not exact or not actually measured, will be helpful to the community.

Legal pressure is building, everywhere. In some countries, all suppliers are legally mandated to report the carbon emissions of their services, to each of their customers/users. Therefore, solutions will come up, some startups will implement ways to report the carbon emissions of various Web services, live. Then some larger vendors will catch up. And everybody is going to use its own way of exposing this information.

It will be much easier and much faster if we all agree on a format to communicate this information, rather than trying to reconcile everybody in 2035.

gregw commented 12 months ago

Bertrand,

I think you misunderstood my point about the difficulty of obtaining the metric.

The issue is not how to measure the metric for method calls, or threads or processes. The problem is how a protocol implementation can assign any such metrics to an individual request/response cycle.

I'm a developer of an HTTP server and application container written in Java. We receive requests, parse them, invoke the application and then assist that application to generate a HTTP response. The problem is the "invoke the application" part. Sometimes that is simply a thread calling an application that does the equivalent of response.getWriter().println("HelloWorld"), but it is also frequently that invocation invokes many other threads, potentially on many other machines, all in the creation of a response. We sometimes have literally hundreds of other servers involved in generating a single response.

Even on a single server, let's say I'm handling 10,000 requests per second, each invoking the application on a known thread. But on the server I'm seeing 50,000 different tasks executed per second by a thread pool of 1000, perhaps there are also 100,000 virtual threads active. Without detailed knowledge of the application, then the HTTP server itself simply map which of those tasks/threads relate to the generation which response. Also in any non trivial application, some of that effort triggered by one request is going to influence the response sent to 1 or maybe thousands of future responses.

It is not an impossible job, just not one that can be done by the HTTP server in isolation from the application.

I do not think this is a problem that can be solved in the protocol layer. Applications need to be tallying the resources they use and recording them. Perhaps they can tally them against request/response, but there are many other candidate keys for resources: connection, session, user principal, etc.

On Wed, 4 Oct 2023 at 22:23, Bertrand Martin @.***> wrote:

About the difficulty to measure the energy and carbon footprint of an HTTP Response

Many responders to the original draft in the mailing list and on this GitHub issue mentioned it's difficult to get the metric in the first place (either in Joules, or in gCO2eq).

It is true the technology is still in its infancy, but we're making a lot of progress. Just to give a few examples:

Others in this discussion were concerned that the values wouldn't be precise enough to be of any use when assessing the carbon footprint of the usage of a given service, served over HTTP.

Most (if not all) organizations assess their carbon emissions with rough estimations, calculated once a year at best. Given the current state of carbon footprint/emissions reporting, adding a little more information, even if not exact or not actually measured, will be helpful to the community.

Legal pressure is building, everywhere. In some countries, all suppliers are legally mandated to report the carbon emissions of their services, to each of their customers/users. Therefore, solutions will come up, some startups will implement ways to report the carbon emissions of various Web services, live. Then some larger vendors will catch up. And everybody is going to use its own way of exposing this information.

It will be much easier and much faster if we all agree on a format to communicate this information, rather than trying to reconcile everybody in 2035.

— Reply to this email directly, view it on GitHub https://github.com/httpwg/admin/issues/52#issuecomment-1746675724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLKOICUXI4Q2QW3N7XDX5VBMJAVCNFSM6AAAAAAW3A6U2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWGY3TKNZSGQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Greg Wilkins @.***> CTO http://webtide.com

ahengst commented 12 months ago

I reviewed the WEF page linked above that describes Scopes. I'm going to assume vendors tallying and reporting their (scope 1) emissions, and then billing their customers, results in customers becoming aware of their Scope 2 (for energy purchases) and Scope 3 (for upstream 'embodied' energy). The proposal is called "scope 2" but for upstream web services it looks more like Scope 3 to me. More significantly, we also identified difficulty knowing where "our" scope 1 ends and "their" scope 3 begins.

The network messaging we are discussing won't be part of that reporting unless it can fully and reliably take the place of 'conventional' reporting methods, and avoids double-reporting or having the side effect of making conventional methods more complicated. I can't yet imagine network based messaging being an accounting/reporting/compliance tool. BUT I do see it having practical value because it reveals something largely invisible to internet users. This makes it really intriguing. Could this new mechanism be useful even if it's NOT accurate at measuring CO2? Could this new mechanism be useful even if it's not accurate at measuring KWh? (because that is sounding tricky too) Is there an even simpler metric, if all we want to see as end users is "this blockchain transaction is like 85 hours of Netflix"? I'm just asking this question, because if this RFC won't be adopted maybe something simpler could be.

In the spirit of kicking tires: once an end user has identified a "high energy" internet provided service, is there a way to learn more about it?

Time is an interesting part of making energy consumption visible. We don't really want to end up with something that looks like my utility bill, but what would we expect to see? Have mockups been done that I haven't seen (since I'm only here on github until now)? Thanks Andy