mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.18k stars 177 forks source link

Store data in smaller steps with reseting emissions data #597

Closed headscott closed 4 months ago

headscott commented 5 months ago

Description

I found some issues from last year, where some people asked for saving emissions of loop steps seperatly without using checkpoints like flush or making new instances of the Emission Tracker. I can't find a way to get the emissions of each step still. I am not sure, if I just don't understand how I can do it.

What I Did

Right now I achieve this by creating a new instance of the tracker in each loop step like that:

def run_method():
    for i in range(1, 5):
        tracker = EmissionsTracker()
        tracker.start()
        # do stuff ...
        emissions: float = tracker.stop()

So it kind of works, but it always prints this:

[codecarbon INFO @ 15:02:39] [setup] RAM Tracking...
[codecarbon INFO @ 15:02:39] [setup] GPU Tracking...
....
....

And if I do this:

def run_method():
    tracker = EmissionsTracker()
    for i in range(1, 5):
        tracker.start()
        # do stuff ...
        emissions: float = tracker.stop()

It always prints this:

[codecarbon WARNING @ 15:04:02] Tracker already stopped !

and add the "kWh of electricity used since the beginning." value and Energy consumed for ... values up after each step

LuisBlanche commented 4 months ago

From the documentation in quickstart you can use task mode that will allow this behavior :

def run_method():
    tracker = EmissionsTracker()
    for i in range(1, 5):
        tracker.start_task(f"task_{i}")
        # do stuff ...
        emissions: float = tracker.stop_task()
    tracker.stop()
headscott commented 4 months ago

I already tried that, but it gives me this error:

Traceback (most recent call last):
  File "C:\Users\headscott\PycharmProjects\RobustBenchCarbon\run_models.py", line 37, in <module>
    run_attack()
  File "C:\Users\headscott\PycharmProjects\RobustBenchCarbon\run_models.py", line 19, in run_attack
    tracker.start_task(f"task_{i}")
  File "C:\Python312\Lib\site-packages\codecarbon\emissions_tracker.py", line 484, in start_task
    _ = self._prepare_emissions_data()
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\codecarbon\emissions_tracker.py", line 627, in _prepare_emissions_data
    emissions_rate=emissions / duration.seconds,  # kg/s
                   ~~~~~~~~~~^~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero
benoit-cty commented 4 months ago

Can you confirm it has been solved by https://github.com/mlco2/codecarbon/pull/589 ?

headscott commented 4 months ago

It fixed this exact error, I am not sure why but now there is another error. So this is my code:

def method():
    tracker = EmissionsTracker()
    emissions_list = []
    try:
        for i in range(1, 3):
            tracker.start_task(f"task_{i}")
            # do stuff
            emissions: float = tracker.stop_task()
            emissions_list.append(emissions)
    finally:
        tracker.stop()

    mean_emissions = np.mean(emissions_list)
    std_emissions = np.std(emissions_list)

    print(f"Mean emissions: {mean_emissions} kgCO2 equivalent")
    print(f"Standard deviation of emissions: {std_emissions} kgCO2 equivalent")

and this is the error

C:\Python312\Lib\site-packages\codecarbon\output_methods\file.py:79: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat(
Traceback (most recent call last):
  File "C:\Users\headscott\PycharmProjects\RobustBenchCarbon\run_models.py", line 39, in <module>
    method()
  File "C:\Users\headscott\PycharmProjects\RobustBenchCarbon\run_models.py", line 29, in run_attack
    mean_emissions = np.mean(emissions_list)
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\numpy\core\fromnumeric.py", line 3504, in mean
    return _methods._mean(a, axis=axis, dtype=dtype,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\numpy\core\_methods.py", line 118, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unsupported operand type(s) for +: 'EmissionsData' and 'EmissionsData'

Seems like the list is filles with elements of EmissionsData, even though stop_taskgives float values. But I saw task_emission_datawhich is returned there, is actually of type EmissionsData

benoit-cty commented 4 months ago

Yes, tracker.stop_task() does not return a float, but an object EmissionsData, you may do emissions: float = tracker.stop_task().emissions.

The documentation is wrong, sorry for that.

headscott commented 4 months ago

Okay thank you, this solved my problem. So I will close it thx