Unknown GPU and CPU - Githubissues

RomainWarlop commented 1 month ago

CodeCarbon version: 2.7.1
Python version: 3.10.14
Operating System: Linux

Description

I'm trying to estimate the carbon impact of LLM using available models on hugging face. So far I'm trying this bloom model in the GPU version. I'm using a NVIDIA Tesla P100 GPU.

Here is my code

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloomz-7b1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")

# track
tracker = EmissionsTracker()
tracker.start()
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
tracker.stop()

I obtained the following error

[codecarbon INFO @ 13:46:18] [setup] RAM Tracking...
[codecarbon INFO @ 13:46:18] [setup] GPU Tracking...
[codecarbon INFO @ 13:46:18] Tracking Nvidia GPU via pynvml
[codecarbon WARNING @ 13:46:18] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
[codecarbon INFO @ 13:46:18] [setup] CPU Tracking...
[codecarbon WARNING @ 13:46:18] No CPU tracking mode found. Falling back on CPU constant mode. 
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon WARNING @ 13:46:19] We saw that you have a Intel(R) Xeon(R) CPU @ 2.30GHz but we don't know it. Please contact us.
[codecarbon INFO @ 13:46:19] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.30GHz
[codecarbon WARNING @ 13:46:19] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
[codecarbon INFO @ 13:46:19] >>> Tracker's metadata:
[codecarbon INFO @ 13:46:19]   Platform system: Linux-5.10.0-31-cloud-amd64-x86_64-with-glibc2.31
[codecarbon INFO @ 13:46:19]   Python version: 3.10.14
[codecarbon INFO @ 13:46:19]   CodeCarbon version: 2.7.1
[codecarbon INFO @ 13:46:19]   Available RAM : 50.999 GB
[codecarbon INFO @ 13:46:19]   CPU count: 8
[codecarbon INFO @ 13:46:19]   CPU model: Intel(R) Xeon(R) CPU @ 2.30GHz
[codecarbon INFO @ 13:46:19]   GPU count: 1
[codecarbon INFO @ 13:46:20]   GPU model: 1 x Tesla P100-PCIE-16GB
[codecarbon INFO @ 13:46:20] Saving emissions data to file /home/jupyter/carbon genAI/emissions.csv
[codecarbon WARNING @ 13:46:20] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
[codecarbon INFO @ 13:46:20] Energy consumed for RAM : 0.000004 kWh. RAM Power : 19.12470817565918 W
[codecarbon WARNING @ 13:46:20] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
[codecarbon INFO @ 13:46:20] Energy consumed for all GPUs : 0.000000 kWh. Total GPU Power : 0.0 W
[codecarbon INFO @ 13:46:20] Energy consumed for all CPUs : 0.000008 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 13:46:20] 0.000012 kWh of electricity used since the beginning.

Could you help me please?

antgioia commented 1 month ago

I have the same problem as you but I didn't find the solution

RomainWarlop commented 1 month ago

I changed the P100 GPU to a T4 GPU and now it's working. So I think the P100 GPU is not tracked by the pynvml library

benoit-cty commented 1 month ago

Yes, it seems than P100 drivers support PyNVML for pynvml.nvmlDeviceGetName but not for nvmlDeviceGetTotalEnergyConsumption.

Maybe you could call pynvml.nvmlSystemGetDriverVersion().decode() to see if you have the last drivers available.

mlco2 / codecarbon

Unknown GPU and CPU #667

Description