sb-ai-lab / Eco2AI

eco2AI is a python library which accumulates statistics about power consumption and CO2 emission during running code.
Apache License 2.0
226 stars 18 forks source link

How does eco2AI handle using single GPUs on a multi GPU rack? #11

Closed maltefranke closed 1 year ago

maltefranke commented 1 year ago

Hey there!

First of all, thank you for the great effort to make AI emissions more transparent, and for drawing more attention to this important topic! I have been using your tool to track the emissions for (small) language models and have a couple of questions regarding the usage of your tool.

Setup: I have a cluster system with 4 GPUs (Tesla V100) available. I run my models with the CUDA_VISIBLE_DEVICES=gpu_id prefix in the command line to only use 1 specific GPU for my training.

Problem: In the output csv, it says that 4 GPUs have been tracked, although I enforce only 1 GPU to be used

Questions:

Thank you very much in advance for your help

vladimir-laz commented 1 year ago

Hello, @maltefranke!

Thank you for your attention to eco2ai and your question!

Here are the answers to your questions:

eco2ai tracks the full GPU consumption of the system and only the CPU processes involved in the current code execution. Unfortunately, currently we do not have the opportunity to track only the resources used by the code for GPUs. However, you can specify the Tracker class with the parameter 'cpu_processes'. It can be set to "current", in which case the Tracker will calculate CPU utilization only for the currently running process, or "all", in which case the Tracker will calculate full CPU utilization.

We use the Python library pynvml to track Nvidia GPUs. By specifying CUDA_VISIBLE_DEVICES=gpu_id, you are instructing your Python script to only recognize the GPU with the ID gpu_id, rather than all the GPUs in your PC/server system. It is difficult for me to provide a specific solution on how to make eco2ai recognize only a certain GPU. However, if the other GPUs are not being used, it will only affect the number of GPUs that can be recognized, not the energy consumption.

I want to thank you because your issue has highlighted a potential improvement for our library. I believe that many other users have faced the same problem. In future versions, we will add the option to choose a specific GPU to track, if it is technically possible.

If you have additional questions, I will be happy to answer them.

maltefranke commented 1 year ago

Thank you very much for your thorough reply! Unfortunately, other models have been running on the other 3 GPUs at the same time and therefore the emissions are entangled, if I understand your explanation correctly. I can also work with a slurm system and request specific GPUs, which in my first attempts seems to work as intended with eco2ai (tracking only the requested GPUs). Nonetheless, not everyone has that luxury, and I believe adding GPU recognition from CUDA_VISIBLE_DEVICES would be a valuable addition to your project. Thanks again for your support!