mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1k stars 157 forks source link

Why does it print while running? #598

Open headscott opened 2 days ago

headscott commented 2 days ago

Description

I got these prints:

...
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:33] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:33] Energy consumed for RAM : 0.001575 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:33] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:33] Energy consumed for all GPUs : 0.000324 kWh. Total GPU Power : 77.71202001205326 W
[codecarbon INFO @ 18:32:33] Energy consumed for all CPUs : 0.000469 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:32:33] 0.002369 kWh of electricity used since the beginning.
[2024-06-28 18:32:35,231][__main__][INFO] - Train Epoch: [0][0/2503]  Time 8.592 (8.592)  Loss 6.9893 (6.9893)  Prec@1 0.195 (0.195)  Prec@5 0.781 (0.781)  LR 0.000
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
[2024-06-28 18:32:42,599][__main__][INFO] - Train Epoch: [0][10/2503]  Time 0.737 (1.451)  Loss 6.9683 (6.9871)  Prec@1 0.000 (0.107)  Prec@5 0.391 (0.408)  LR 0.002
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:48] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:48] Energy consumed for RAM : 0.003147 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:48] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:48] Energy consumed for all GPUs : 0.001271 kWh. Total GPU Power : 227.225096755562 W
[codecarbon INFO @ 18:32:48] Energy consumed for all CPUs : 0.000938 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:32:48] 0.005356 kWh of electricity used since the beginning.
[2024-06-28 18:32:49,984][__main__][INFO] - Train Epoch: [0][20/2503]  Time 0.740 (1.112)  Loss 6.9804 (6.9801)  Prec@1 0.195 (0.121)  Prec@5 0.391 (0.409)  LR 0.003
[2024-06-28 18:32:57,397][__main__][INFO] - Train Epoch: [0][30/2503]  Time 0.745 (0.992)  Loss 6.9760 (6.9813)  Prec@1 0.000 (0.113)  Prec@5 0.391 (0.403)  LR 0.005
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:33:03] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:33:03] Energy consumed for RAM : 0.004720 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:33:03] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:33:03] Energy consumed for all GPUs : 0.002218 kWh. Total GPU Power : 227.31828920761552 W
[codecarbon INFO @ 18:33:03] Energy consumed for all CPUs : 0.001407 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:33:03] 0.008345 kWh of electricity used since the beginning.
[2024-06-28 18:33:04,848][__main__][INFO] - Train Epoch: [0][40/2503]  Time 0.745 (0.932)  Loss 7.0290 (6.9850)  Prec@1 0.195 (0.119)  Prec@5 0.195 (0.405)  LR 0.007
[2024-06-28 18:33:12,488][__main__][INFO] - Train Epoch: [0][50/2503]  Time 0.738 (0.899)  Loss 6.9633 (6.9860)  Prec@1 0.000 (0.119)  Prec@5 0.391 (0.406)  LR 0.008
...

Why does it print the scontrol error muliple times while training? It also prints the "Energy consumed" messages all the time

What I Did

I just wanted to run this code:

import os
import subprocess
from codecarbon import EmissionsTracker

def energie_von_command(command):
    tracker = EmissionsTracker(output_dir="emissions")
    tracker.start()
    try:
        subprocess.run(command, shell=True, check=True)
    finally:
        tracker.stop()
        print(f"Energieverbrauch (kWh): {tracker.final_emissions_data['emissions'] / 1000.0}")

if __name__ == "__main__":
    energie_von_command(f"make experiments/RaResNet50/.done_train")

the make command is from this github: poloclub/robust-principles