mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.01k stars 158 forks source link

Reinitialising EmissionsTracker for smaller loops #438

Closed pugantsov closed 11 months ago

pugantsov commented 11 months ago

Description

I am trying to track emissions of a lot of fairly small loops, and I'm noticing that by initialising the EmissionsTracker at the beginning of every loop is causing significant overhead. I might've missed it in the documentation but is there a way to retain the same EmissionsTracker but effectively "reset" it so it bypasses all of the initialisation/connection with pynvml etc.? I guess a workaround could be that I just initialise a bunch of them before my loop but I don't think this is cutting down on any time.

Currently I'm just using it like

for loop:
    emissions = EmissionsTracker()
    emissions.start()
    # do something
    emissions.stop()
benoit-cty commented 11 months ago

Hello,

You could try:

emissions = EmissionsTracker()
emissions.start()
for loop:
    emissions.flush()
    # do something
emissions.stop()

We will release a new version this week that implement task monitoring, you can read the doc here : https://github.com/mlco2/codecarbon/pull/355/files#diff-cd895c905cf221ed0a1139844e15165861387c12e2f02753776bbe833f1aae14R45

Did you think it will work for you ?

pugantsov commented 11 months ago

As I understand it, flush just computes the current emissions at a particular time, is that correct?

If so, I could just subtract the previous emissions at each iteration by using flush?

benoit-cty commented 11 months ago

EDIT : Yes, flush is more a checkpoint: it does not reset the data, it only store them to avoid loosing them if there is a crash and to give a view when training for a very long time.

flush will compute the emission since the last call of flush, or start if no flush has been made. So to have the entire emissions you have to sum up all the flush results.

pugantsov commented 11 months ago

Got it, thanks, this is ideal!

pugantsov commented 11 months ago

Apologies for the comment spam but I had to clarify something in models I have already trained. When I first used CodeCarbon, I was looking for a way to report the total energy consumed at every epoch, which I achieved via the following approach:

class Emissions(EmissionsTracker):
    def __init__(self):
        super().__init__()

    def get_emissions_data(self):
        _ = self.flush()
        emissions_data = self._prepare_emissions_data()

        if not emissions_data.emissions:
            raise Exception("No emissions data found.")

        return {
            "cpu_power": emissions_data.cpu_power / 1000,
            "gpu_power": emissions_data.gpu_power / 1000,
            "ram_power": emissions_data.ram_power / 1000,
            "cpu_energy_consumed": self._total_cpu_energy.kWh,
            "gpu_energy_consumed": self._total_gpu_energy.kWh,
            "ram_energy_consumed": self._total_ram_energy.kWh,
            "total_energy_consumed": self._total_energy.kWh,
            "co2_emissions": emissions_data.emissions,
        }

My graph outputs show an exponential increase in energy consumed until the end of the runs, I assume that ._prepare_emissions_data() provides a total from the very start of training, and the flush here doesn't affect the emissions tracking? (Concerned with accuracy of estimates here as I've trained quite a few models).

GPU energy consumed for runs for reference: image

benoit-cty commented 11 months ago

My bad, looking at the code flush is more a checkpoint: it does not reset the data, it only store them to avoid loosing them if there is a crash and to give a view when training for a very long time.

I now remember that we made it for the training of Bloom .

If you update to 2.3.0 you could use task :

class Emissions(EmissionsTracker):
    def __init__(self):
        super().__init__()
        super().start_task()

    def get_emissions_data(self):
        emissions_data = self.stop_task()
        if not emissions_data.emissions:
            raise Exception("No emissions data found.")
        self.start_task()
        return {
            "cpu_power": emissions_data.cpu_power / 1000,
            "gpu_power": emissions_data.gpu_power / 1000,
            "ram_power": emissions_data.ram_power / 1000,
            "cpu_energy_consumed": self._total_cpu_energy.kWh,
            "gpu_energy_consumed": self._total_gpu_energy.kWh,
            "ram_energy_consumed": self._total_ram_energy.kWh,
            "total_energy_consumed": self._total_energy.kWh,
            "co2_emissions": emissions_data.emissions,
        }

Let me know if it works.

Or you could just add delta=True in your previous code emissions_data = self._prepare_emissions_data(delta=True) . But _prepare_emissions_data is not supposed to be called by users.

pugantsov commented 11 months ago

I've tried using delta, and here are my outputs:

Initiating Emissions outside of loop and using delta:

domain cpu_energy_consumed gpu_energy_consumed total_energy_consumed co2_emissions
0 business_finance 0.000211548 4.01752e-05 0.000495608 0.000133709
1 computers_internet 0.000412762 7.88342e-05 0.000969199 0.000127769
2 education_reference 0.000613303 0.00011636 0.00143555 0.000125814
3 entertainment_music 0.000829148 0.000148572 0.00187608 0.000118851
4 family_relationships 0.00104987 0.000190075 0.00239169 0.000139104
5 health 0.00124759 0.000227837 0.00285672 0.000125459
6 politics_government 0.00144783 0.000261735 0.0032953 0.000118322
7 science_mathematics 0.00165944 0.000301949 0.00379161 0.000133897
8 society_culture 0.00188305 0.000342077 0.00430129 0.000137505
9 sports 0.00211313 0.000380974 0.00480787 0.00013667

Initiating a new Emissions object at the beginning of each loop iteration:

domain cpu_energy_consumed gpu_energy_consumed total_energy_consumed co2_emissions
0 business_finance 0.000196129 3.81157e-05 0.000464391 0.000125287
1 computers_internet 0.000198403 3.80815e-05 0.000467412 0.000126102
2 education_reference 0.00021492 4.25208e-05 0.000513729 0.000138598
3 entertainment_music 0.000202835 3.92338e-05 0.000478709 0.000129149
4 family_relationships 0.000193552 3.74038e-05 0.000456428 0.000123138
5 health 0.000207246 4.04279e-05 0.000492394 0.000132841
6 politics_government 0.000218375 4.21931e-05 0.000516227 0.000139271
7 science_mathematics 0.000196969 3.78047e-05 0.000464599 0.000125343
8 society_culture 0.000194441 3.77904e-05 0.00046263 0.000124812
9 sports 0.000195935 3.75336e-05 0.000462923 0.000124891

As you can see, the latter is much more consistent and does not seem to be exponentially increasing (it seems to reset emissions fairly well, but the energy consumed keeps rising).

I upgraded my codecarbon version to 2.3.0 and tried the code you posted but I get the following error:

     48 self._lock.acquire()
     49 self._stopped = True
---> 50 self._timer.cancel()
     51 self._lock.release()

AttributeError: 'NoneType' object has no attribute 'cancel'

This also happens when I just initialise a standard EmissionsTracker object as per the official docs:

emissions = EmissionsTracker(project_name="bert_inference", measure_power_secs=10)
for domain in tqdm(args.domains, total=len(args.domains)):
    emissions.start_task()
    args.domain = domain
    dsets = {domain: datasets.load_from_disk(args.data_dir/domain) for domain in args.domains}
    dsets = drop_dataset_splits(dsets, args)
    run(dsets, args)
    emissions = emissions.stop_task()
emissions.stop()
benoit-cty commented 11 months ago

Hello, Thanks for finding this bug : a workaround is to call super().start() or emissions.start() before start_task(). I will fix this in the next release.

Here is the working example : https://github.com/mlco2/codecarbon/blob/master/examples/task_inference.py

pugantsov commented 11 months ago

Great, this works. Thanks!