mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.18k stars 178 forks source link

Why does it print while running? #598

Closed headscott closed 2 months ago

headscott commented 5 months ago

Description

I got these prints:

...
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:33] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:33] Energy consumed for RAM : 0.001575 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:33] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:33] Energy consumed for all GPUs : 0.000324 kWh. Total GPU Power : 77.71202001205326 W
[codecarbon INFO @ 18:32:33] Energy consumed for all CPUs : 0.000469 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:32:33] 0.002369 kWh of electricity used since the beginning.
[2024-06-28 18:32:35,231][__main__][INFO] - Train Epoch: [0][0/2503]  Time 8.592 (8.592)  Loss 6.9893 (6.9893)  Prec@1 0.195 (0.195)  Prec@5 0.781 (0.781)  LR 0.000
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
[2024-06-28 18:32:42,599][__main__][INFO] - Train Epoch: [0][10/2503]  Time 0.737 (1.451)  Loss 6.9683 (6.9871)  Prec@1 0.000 (0.107)  Prec@5 0.391 (0.408)  LR 0.002
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:48] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:48] Energy consumed for RAM : 0.003147 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:32:48] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:32:48] Energy consumed for all GPUs : 0.001271 kWh. Total GPU Power : 227.225096755562 W
[codecarbon INFO @ 18:32:48] Energy consumed for all CPUs : 0.000938 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:32:48] 0.005356 kWh of electricity used since the beginning.
[2024-06-28 18:32:49,984][__main__][INFO] - Train Epoch: [0][20/2503]  Time 0.740 (1.112)  Loss 6.9804 (6.9801)  Prec@1 0.195 (0.121)  Prec@5 0.391 (0.409)  LR 0.003
[2024-06-28 18:32:57,397][__main__][INFO] - Train Epoch: [0][30/2503]  Time 0.745 (0.992)  Loss 6.9760 (6.9813)  Prec@1 0.000 (0.113)  Prec@5 0.391 (0.403)  LR 0.005
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:33:03] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:33:03] Energy consumed for RAM : 0.004720 kWh. RAM Power : 377.87676858901983 W
/bin/bash: line 1: scontrol: command not found
[codecarbon WARNING @ 18:33:03] Error running `scontrol show job $SLURM_JOB_ID` to retrieve SLURM-available RAM.Using the machine's total RAM.
[codecarbon INFO @ 18:33:03] Energy consumed for all GPUs : 0.002218 kWh. Total GPU Power : 227.31828920761552 W
[codecarbon INFO @ 18:33:03] Energy consumed for all CPUs : 0.001407 kWh. Total CPU Power : 112.5 W
[codecarbon INFO @ 18:33:03] 0.008345 kWh of electricity used since the beginning.
[2024-06-28 18:33:04,848][__main__][INFO] - Train Epoch: [0][40/2503]  Time 0.745 (0.932)  Loss 7.0290 (6.9850)  Prec@1 0.195 (0.119)  Prec@5 0.195 (0.405)  LR 0.007
[2024-06-28 18:33:12,488][__main__][INFO] - Train Epoch: [0][50/2503]  Time 0.738 (0.899)  Loss 6.9633 (6.9860)  Prec@1 0.000 (0.119)  Prec@5 0.391 (0.406)  LR 0.008
...

Why does it print the scontrol error muliple times while training? It also prints the "Energy consumed" messages all the time

What I Did

I just wanted to run this code:

import os
import subprocess
from codecarbon import EmissionsTracker

def energie_von_command(command):
    tracker = EmissionsTracker(output_dir="emissions")
    tracker.start()
    try:
        subprocess.run(command, shell=True, check=True)
    finally:
        tracker.stop()
        print(f"Energieverbrauch (kWh): {tracker.final_emissions_data['emissions'] / 1000.0}")

if __name__ == "__main__":
    energie_von_command(f"make experiments/RaResNet50/.done_train")

the make command is from this github: poloclub/robust-principles

benoit-cty commented 5 months ago

Hello, You could lower logs with parameter log_level : EmissionsTracker(output_dir="emissions", log_level="error")

But could you tell us more on your environment ? Because you seems to have a SLURM_JOB_ID env variable set but no scontrol tool. Are you on a SLURM supercomputer ?

headscott commented 5 months ago

Okay thank you.

I am running this code on a remote cluster. There I started a tmux session and inside this session I started my slurm job via srun I can use squeue and scontrol on the cluster, but not inside the tmux session with my slurmjob already running. I am not sure if the setup is wrong, but this is probably the reason, why it keeps saying Error running scontrol show job $SLURM_JOB_ID.

benoit-cty commented 5 months ago

Thank's for the information. We display a warning, that seems legit. But it will be better to display it only once in this case.

headscott commented 5 months ago

If I do log_level="error", it still prints

/bin/bash: line 1: scontrol: command not found
/bin/bash: line 1: scontrol: command not found

multiple times

headscott commented 4 months ago

I don't get it, but now when I run this code:

from codecarbon import EmissionsTracker

def energy_of_command(command):
    tracker = EmissionsTracker(output_dir="emissions")
    tracker.start()
    try:
        print("TEST")
    finally:
        tracker.stop()
        print(f"Energy consumed (kWh): {tracker.final_emissions_data['emissions'] / 1000.0}")

if __name__ == "__main__":
    energy_of_command(f"make experiments/RaResNet50/.done_train")

it says:

TEST
[codecarbon INFO @ 16:07:00] Saving emissions data to file C:\Users\fabia\PycharmProjects\RobustBenchCarbon\emissions\emissions.csv
[codecarbon INFO @ 16:07:00] Energy consumed for RAM : 0.000000 kWh. RAM Power : 5.98262357711792 W
[codecarbon INFO @ 16:07:00] Energy consumed for all GPUs : 0.000000 kWh. Total GPU Power : 0.0 W
[codecarbon INFO @ 16:07:00] Energy consumed for all CPUs : 0.000000 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 16:07:00] 0.000000 kWh of electricity used since the beginning.
C:\Python312\Lib\site-packages\codecarbon\output_methods\file.py:50: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df, pd.DataFrame.from_records([dict(total.values)])])
[codecarbon WARNING @ 16:07:00] graceful shutdown. Exceptions:
[codecarbon WARNING @ 16:07:00] <class 'Exception'>
Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\codecarbon\core\util.py", line 23, in suppress
    yield
  File "C:\Python312\Lib\contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\codecarbon\emissions_tracker.py", line 569, in stop
    self._persist_data(
  File "C:\Python312\Lib\site-packages\codecarbon\emissions_tracker.py", line 590, in _persist_data
    handler.out(total_emissions, delta_emissions)
  File "C:\Python312\Lib\site-packages\codecarbon\output_methods\file.py", line 71, in out
    df.to_csv(self.save_file_path, index=False)
  File "C:\Users\fabia\AppData\Roaming\Python\Python312\site-packages\pandas\util\_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\fabia\AppData\Roaming\Python\Python312\site-packages\pandas\core\generic.py", line 3967, in to_csv
    return DataFrameRenderer(formatter).to_csv(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\fabia\AppData\Roaming\Python\Python312\site-packages\pandas\io\formats\format.py", line 1014, in to_csv
    csv_formatter.save()
  File "C:\Users\fabia\AppData\Roaming\Python\Python312\site-packages\pandas\io\formats\csvs.py", line 251, in save
    with get_handle(
         ^^^^^^^^^^^
  File "C:\Users\fabia\AppData\Roaming\Python\Python312\site-packages\pandas\io\common.py", line 749, in get_handle
    check_parent_directory(str(handle))
  File "C:\Users\fabia\AppData\Roaming\Python\Python312\site-packages\pandas\io\common.py", line 616, in check_parent_directory
    raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: 'emissions'
[codecarbon WARNING @ 16:07:00] stopping.
Traceback (most recent call last):
  File "C:\Users\fabia\PycharmProjects\RobustBenchCarbon\testeTracker.py", line 13, in <module>
    energy_of_command(f"make experiments/RaResNet50/.done_train")
  File "C:\Users\fabia\PycharmProjects\RobustBenchCarbon\testeTracker.py", line 10, in energy_of_command
    print(f"Energy consumed (kWh): {tracker.final_emissions_data['emissions'] / 1000.0}")
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'EmissionsTracker' object has no attribute 'final_emissions_data'. Did you mean: '_prepare_emissions_data'?
benoit-cty commented 4 months ago

Sorry for the delay,

They are two things to change:

benoit-cty commented 4 months ago

I've created a PR to have a better error message : https://github.com/mlco2/codecarbon/pull/611

headscott commented 4 months ago

okay thank you. This helped