Closed ainhoaVivel closed 1 week ago
Hello @ainhoaVivel!
Thanks for using codecarbon
and for reporting this.
In this case, if codecarbon
worked with versions <2.3.0 but not with versions >2.3.0 I suspect that the pynvml.nvmlDeviceGetTotalEnergyConsumption
call never worked. From 2.3.0 onwards we measure the energy of the GPU whereas before we were using only the power, to calculate the emissions.
Could you check if the drivers are well set? By running
nvidia-smi
Other option is that the drivers are ok but pynvml
for some reason does not initialize correctly, you could try something like:
import pynvml
try:
pynvml.nvmlInit()
device_count = pynvml.nvmlDeviceGetCount()
print(f"Number of GPUs available: {device_count}")
for i in range(device_count):
handle = pynvml.nvmlDeviceGetHandleByIndex(i)
info = pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)
print(f"id:{i}, info:{info}")
pynvml.nvmlShutdown()
except pynvml.NVMLError as e:
print(f"Failed to initialize NVML: {str(e)}")
Hi @inimaz! Thank you very much for your answer.
I can confirm that drivers are working fine.
You are right about pynvml. I created a new file with the code you provided and got the same error as when I try to run my LM training script that has CodeCarbon.
Number of GPUs available: 2
Failed to initialize NVML: System is not in ready state
Do you know how this problem could be solved?
Good thing is that you can reproduce it with that example. Bad thing is that I don't know how to help any further...
On codecarbon
what we might do is if the call to pynvml.nvmlDeviceGetTotalEnergyConsumption
is not succesful, go into some constant mode.
On pynvml
side... maybe you could open an issue to their repo? https://github.com/gpuopenanalytics/pynvml
Okay! It is clear that this issue is not of codecarbon, so I'll ask in pynvml for more information about this error. Thank you very much for you help!
Description
I wanted to measure the consumption of some mt5-base trainings using CodeCarbon. So far I was using v2.2.0 for training and I didn't have any issue. A few days ago I upgraded to v2.4.2 and it doesn't work anymore. I didn't make any changes to my code. I have tried going back to the previous version of CodeCarbon and I have also tried other versions. I have been able to verify that the error occurs since v2.3.0, when pynvml was introduced.
What I Did
I executed my script
However, I got this error
My python scripts is like this:
Inside baseline I have a few tracker.flush(), but nothing else related to CodeCarbon.
I have tried several versions of CodeCarbon and pynvml, but nothing. I can't find any additional information about the System is not in ready state error either. Any idea how to fix this or what causes it?