mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.01k stars 158 forks source link

Does CodeCarbon track energy uses of popular APIs? #475

Closed Dorado1987A closed 3 months ago

Dorado1987A commented 7 months ago

Description

Does CodeCarbon monitor energy use and emissions of APIs called?

For instance, if you call OpenAI's GPT model, CodeCarbon is only tracking the energy consumption of making the call and processing the API response rather than the consumption of the API side processing. Am I correct in that assumption?

If I wanted to add in consumption metrics for API calls and the related emission metrics, would this have to be a manual process I do rather than with CodeCarbon?

SaboniAmine commented 7 months ago

Hello, Thanks for this feature request. We are currently opening this topic to study, to see if there is a viable methodology to estimate those values. The main issue is that it is difficult to guess which is the hardware used behind the APIs, thus its electrical consumption and the source of its energy. In any case this will first be done manually. If you have any idea on how to measure this, please share it with us.

Dorado1987A commented 7 months ago

It's a difficult one because there are a lot of unknowns. I see a couple of viable options, and one would be to store a list of the most popular APIs and their energy consumption metrics and then use the parameters sent via the API call to work out an indication of the energy usage.

For instance, OpenAI has published the energy consumption of each query sent to their various models via API; if you can extract the parameters (number of tokens, exact model) from the API call and responses, you might be able to calculate the energy used by that call based on the predefined energy metrics. Alternatively, you could calculate an average for each time the API is called and add the value as a fixed value that offers more of an indication than an accurate measure.

With APIs that are less popular or don't have public energy consumption metrics, I don't see an easy solution to that unless they install CodeCarbon onto the API or publish consumption data.

That being said, I suggest giving the user the option to define the energy consumption of an API call in the CodeCarbon parameters. This will then allow for the less popular APIs to be measured without requiring your team to spend time programming in the usage metrics yourselves.

This could look like the following:

Option 1: If known to the user, they could define the hardware that's being used by the API, and then you measure the time it takes for the API call to be made, to get an indication of the power usage, knowing the hardware and the run time.

Option 2: If the first option isn't possible for the user, they could request energy usage metrics from the API owner, and offer the energy data as an input parameter to CodeCarbon as a fixed value that's added every time the API is called.

I think that's all I can think of for solutions, let me know if it's helpful at all, I'm also more than happy to collaborate with you on developing any of these solutions in my free time, so feel free to ask!

SaboniAmine commented 7 months ago

Thanks for your input, it is valuable. We'll keep you updated about this topic, which is being addressed outside of this repo for the moment.

benoit-cty commented 7 months ago

Thanks @Dorado1987A , do you have a source for "OpenAI has published the energy consumption of each query sent to their various models via API" ? This will be very interesting.

Dorado1987A commented 7 months ago

Hi @benoit-cty, apparently, I was hallucinating that data. I swore I read a report with published data, but all I can find now is estimates... The below are the closest I've found to this type of insight - not much help, I know, sorry.

https://www.sciencedirect.com/science/article/pii/S2542435123003653

https://arxiv.org/pdf/2304.03271.pdf

benoit-cty commented 7 months ago

Thanks. There's also Kasper Groes Albin Ludvigsen that has published some estimates: https://towardsdatascience.com/chatgpts-energy-use-per-query-9383b8654487

Soon after ChatGPT’s debut in December, Altman estimated its cost to be “probably single-digits cents per chat.” multiplied it by the analysts’ anticipated upward of 10 million users each day. Based on the processing required to run GPT-3.5, the default model at the time, SemiAnalysis estimated in February that ChatGPT was costing OpenAI over $700,000 per day in computational expenditures alone. Source

Another way is to use the production of Nvidia GPU as a basis: The paper includes a little more realistic scenario calculating the potential energy consumption of the 100,000 AI servers Nvidia is expected to deliver this year. Running at full capacity, those servers might burn through 5.7 to 8.9 TWh of electricity a year. That’s “almost negligible” in comparison to data centers’ historical estimated annual electricity use of 205 TWh, de Vries writes. TheVerge

benoit-cty commented 3 months ago

There is a DataForGood France development in progress on this subject : https://dataforgood.fr/projects/genai-impacts

I think it's outside the scope of CodeCarbon do do API monitoring.