gpuopenanalytics / pynvml

Provide Python access to the NVML library for GPU diagnostics
BSD 3-Clause "New" or "Revised" License
203 stars 31 forks source link

pynvml.nvml.NVMLError: System is not in ready state #54

Open ainhoaVivel opened 2 weeks ago

ainhoaVivel commented 2 weeks ago

Description

I am using CodeCarbon to make some consumption measurements. However, this library uses pynvml in the background to access the graph information. I asked in the project repository and it seems that my problem is that pynvml is not working properly.

What I Did

I created this script

import pynvml

try:
    pynvml.nvmlInit()
    print("NVML initialized successfully")

    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    print(f"Device 0: {pynvml.nvmlDeviceGetName(handle)}")

    total_energy = pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)
    print(f"Total energy consumption: {total_energy} mJ")

except pynvml.NVMLError as error:
    print(f"Failed to initialize NVML: {error}")

finally:
    pynvml.nvmlShutdown()

However, I got this output

NVML initialized successfully
Device 0: NVIDIA H100 PCIe
Failed to initialize NVML: System is not in ready state

I have tried several versions pynvml, but nothing. I can't find any additional information about the System is not in ready state error either. How can I fix this error?

Lucas-Otavio commented 3 days ago

I am also using CodeCarbon and I'm facing some similar issues, but the execution environment and error message are different.

I am trying to dockerize a project that uses Code Carbon, and it does not work inside the docker, even though nvidia-smi outputs as usual.

Environment:

Output: When running the same script, the error was different.

NVML initialized successfully
Device 0: NVIDIA GeForce GTX 980M
Failed to initialize NVML: Not Supported