mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.01k stars 158 forks source link

AMD GPUs support #490

Open IlyasMoutawwakil opened 5 months ago

IlyasMoutawwakil commented 5 months ago

This PR adds support for amd gpus through amdsmi.

445

IlyasMoutawwakil commented 5 months ago

running this on an MI250, rocm 5.6.1, with amd-smi installed:

from codecarbon import EmissionsTracker
import torch
import time

def workload():
    matrix1 = torch.randn(1000, 1000, device="cuda")
    matrix2 = torch.randn(1000, 1000, device="cuda")

    return matrix1 @ matrix2

with EmissionsTracker(tracking_mode="process") as tracker:
    start = time.time()
    while time.time() - start < 10:
        workload()

print("total_energy:", tracker._total_energy.kWh)
print("total_co2:", tracker.final_emissions)

I get a coherent output:

[codecarbon INFO @ 09:47:24] [setup] RAM Tracking...
[codecarbon INFO @ 09:47:24] [setup] GPU Tracking...
[codecarbon INFO @ 09:47:25] Tracking AMD GPU via amdsmi
[codecarbon INFO @ 09:47:25] [setup] CPU Tracking...
[codecarbon WARNING @ 09:47:25] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 09:47:26] CPU Model on constant consumption mode: AMD EPYC 7763 64-Core Processor
[codecarbon INFO @ 09:47:26] >>> Tracker's metadata:
[codecarbon INFO @ 09:47:26]   Platform system: Linux-5.15.0-84-generic-x86_64-with-glibc2.35
[codecarbon INFO @ 09:47:26]   Python version: 3.10.12
[codecarbon INFO @ 09:47:26]   CodeCarbon version: 2.3.2
[codecarbon INFO @ 09:47:26]   Available RAM : 1007.705 GB
[codecarbon INFO @ 09:47:26]   CPU count: 128
[codecarbon INFO @ 09:47:26]   CPU model: AMD EPYC 7763 64-Core Processor
[codecarbon INFO @ 09:47:26]   GPU count: 1
[codecarbon INFO @ 09:47:26]   GPU model: 1 x AMD INSTINCT MI250 (MCM) OAM AC MBA

[codecarbon INFO @ 09:47:40] Energy consumed for RAM : 0.000001 kWh. RAM Power : 0.3306770324707031 W
[codecarbon INFO @ 09:47:40] Energy consumed for all GPUs : 0.000848 kWh. Total GPU Power : 303.82411939507443 W
[codecarbon INFO @ 09:47:40] Energy consumed for all CPUs : 0.000391 kWh. Total CPU Power : 140.0 W
[codecarbon INFO @ 09:47:40] 0.001240 kWh of electricity used since the beginning.
total_energy: 0.001239593259251299
total_co2: 3.539906470443935e-05
IlyasMoutawwakil commented 5 months ago

@benoit-cty @SabAmine

benoit-cty commented 5 months ago

Thanks, it's really great!

Do you think it's possible to have a machine with both AMD and Nvidia GPU ?

Before merging, this PR needs unit tests and documentation.

IlyasMoutawwakil commented 5 months ago

Apparently it's possible to have both (though very rare), I will update the code to account for it. Not sure about unit tests, do you run NVIDIA GPU tests in a workflow ?

benoit-cty commented 5 months ago

For the test, we do 'mock' to check the function call. This way you can test if the code works with both nvidia and amd GPU, even if you don't have them.

fxmarty commented 2 months ago

@IlyasMoutawwakil any update?

benoit-cty commented 2 months ago

ROCm is not packaged for Python, but I think it is not blocking for this PR, we already have the issue for MacOSX and Windows.

To merge this PRR, we need tests and documentation.