Does CodeCarbon track energy uses of popular APIs?

Dorado1987A commented 1 year ago

CodeCarbon version: 2.3.1
Python version: 3.10.11
Operating System: Windows-10-10.0.22621-SP0

Description

Does CodeCarbon monitor energy use and emissions of APIs called?

For instance, if you call OpenAI's GPT model, CodeCarbon is only tracking the energy consumption of making the call and processing the API response rather than the consumption of the API side processing. Am I correct in that assumption?

If I wanted to add in consumption metrics for API calls and the related emission metrics, would this have to be a manual process I do rather than with CodeCarbon?

SaboniAmine commented 1 year ago

Hello, Thanks for this feature request. We are currently opening this topic to study, to see if there is a viable methodology to estimate those values. The main issue is that it is difficult to guess which is the hardware used behind the APIs, thus its electrical consumption and the source of its energy. In any case this will first be done manually. If you have any idea on how to measure this, please share it with us.

Dorado1987A commented 1 year ago

It's a difficult one because there are a lot of unknowns. I see a couple of viable options, and one would be to store a list of the most popular APIs and their energy consumption metrics and then use the parameters sent via the API call to work out an indication of the energy usage.

For instance, OpenAI has published the energy consumption of each query sent to their various models via API; if you can extract the parameters (number of tokens, exact model) from the API call and responses, you might be able to calculate the energy used by that call based on the predefined energy metrics. Alternatively, you could calculate an average for each time the API is called and add the value as a fixed value that offers more of an indication than an accurate measure.

With APIs that are less popular or don't have public energy consumption metrics, I don't see an easy solution to that unless they install CodeCarbon onto the API or publish consumption data.

That being said, I suggest giving the user the option to define the energy consumption of an API call in the CodeCarbon parameters. This will then allow for the less popular APIs to be measured without requiring your team to spend time programming in the usage metrics yourselves.

This could look like the following:

Option 1: If known to the user, they could define the hardware that's being used by the API, and then you measure the time it takes for the API call to be made, to get an indication of the power usage, knowing the hardware and the run time.

Option 2: If the first option isn't possible for the user, they could request energy usage metrics from the API owner, and offer the energy data as an input parameter to CodeCarbon as a fixed value that's added every time the API is called.

I think that's all I can think of for solutions, let me know if it's helpful at all, I'm also more than happy to collaborate with you on developing any of these solutions in my free time, so feel free to ask!

SaboniAmine commented 1 year ago

Thanks for your input, it is valuable. We'll keep you updated about this topic, which is being addressed outside of this repo for the moment.

benoit-cty commented 1 year ago

Thanks @Dorado1987A , do you have a source for "OpenAI has published the energy consumption of each query sent to their various models via API" ? This will be very interesting.

Dorado1987A commented 1 year ago

Hi @benoit-cty, apparently, I was hallucinating that data. I swore I read a report with published data, but all I can find now is estimates... The below are the closest I've found to this type of insight - not much help, I know, sorry.

https://www.sciencedirect.com/science/article/pii/S2542435123003653

https://arxiv.org/pdf/2304.03271.pdf

benoit-cty commented 1 year ago

Thanks. There's also Kasper Groes Albin Ludvigsen that has published some estimates: https://towardsdatascience.com/chatgpts-energy-use-per-query-9383b8654487

Soon after ChatGPT’s debut in December, Altman estimated its cost to be “probably single-digits cents per chat.” multiplied it by the analysts’ anticipated upward of 10 million users each day. Based on the processing required to run GPT-3.5, the default model at the time, SemiAnalysis estimated in February that ChatGPT was costing OpenAI over $700,000 per day in computational expenditures alone. Source

Another way is to use the production of Nvidia GPU as a basis: The paper includes a little more realistic scenario calculating the potential energy consumption of the 100,000 AI servers Nvidia is expected to deliver this year. Running at full capacity, those servers might burn through 5.7 to 8.9 TWh of electricity a year. That’s “almost negligible” in comparison to data centers’ historical estimated annual electricity use of 205 TWh, de Vries writes. TheVerge

benoit-cty commented 8 months ago

There is a DataForGood France development in progress on this subject : https://dataforgood.fr/projects/genai-impacts

I think it's outside the scope of CodeCarbon do do API monitoring.

adrien341 commented 2 weeks ago

Hi ! Sorry for re-openning this old topic, but has there been any change of thought concerning the monitoring of major APIs?

To my current project, it would be rather useful being able to evaluate the cost of running gpt4o calls within a code, in addition to regular data science operations.

I read the Data4goodFrance works you pointed out (thx btw !), and there seems to have a few academic works with orders of magitudes that CodeCarbon could integrate if needed.

Anyway many thanks for your nice tool !

Ad.

benoit-cty commented 2 weeks ago

Hi Adrien,

The package https://ecologits.ai/latest/ has been released and does what you need.

The main contributor of this project is also a member of CodeCarbon 😉

mlco2 / codecarbon

Does CodeCarbon track energy uses of popular APIs? #475

Description