Closed pugantsov closed 11 months ago
Hello,
You could try:
emissions = EmissionsTracker()
emissions.start()
for loop:
emissions.flush()
# do something
emissions.stop()
We will release a new version this week that implement task monitoring, you can read the doc here : https://github.com/mlco2/codecarbon/pull/355/files#diff-cd895c905cf221ed0a1139844e15165861387c12e2f02753776bbe833f1aae14R45
Did you think it will work for you ?
As I understand it, flush just computes the current emissions at a particular time, is that correct?
If so, I could just subtract the previous emissions at each iteration by using flush?
EDIT : Yes, flush is more a checkpoint: it does not reset the data, it only store them to avoid loosing them if there is a crash and to give a view when training for a very long time.
flush
will compute the emission since the last call of flush
, or start
if no flush has been made. So to have the entire emissions you have to sum up all the flush results.
Got it, thanks, this is ideal!
Apologies for the comment spam but I had to clarify something in models I have already trained. When I first used CodeCarbon, I was looking for a way to report the total energy consumed at every epoch, which I achieved via the following approach:
class Emissions(EmissionsTracker):
def __init__(self):
super().__init__()
def get_emissions_data(self):
_ = self.flush()
emissions_data = self._prepare_emissions_data()
if not emissions_data.emissions:
raise Exception("No emissions data found.")
return {
"cpu_power": emissions_data.cpu_power / 1000,
"gpu_power": emissions_data.gpu_power / 1000,
"ram_power": emissions_data.ram_power / 1000,
"cpu_energy_consumed": self._total_cpu_energy.kWh,
"gpu_energy_consumed": self._total_gpu_energy.kWh,
"ram_energy_consumed": self._total_ram_energy.kWh,
"total_energy_consumed": self._total_energy.kWh,
"co2_emissions": emissions_data.emissions,
}
My graph outputs show an exponential increase in energy consumed until the end of the runs, I assume that ._prepare_emissions_data() provides a total from the very start of training, and the flush here doesn't affect the emissions tracking? (Concerned with accuracy of estimates here as I've trained quite a few models).
GPU energy consumed for runs for reference:
My bad, looking at the code flush is more a checkpoint: it does not reset the data, it only store them to avoid loosing them if there is a crash and to give a view when training for a very long time.
I now remember that we made it for the training of Bloom .
If you update to 2.3.0
you could use task :
class Emissions(EmissionsTracker):
def __init__(self):
super().__init__()
super().start_task()
def get_emissions_data(self):
emissions_data = self.stop_task()
if not emissions_data.emissions:
raise Exception("No emissions data found.")
self.start_task()
return {
"cpu_power": emissions_data.cpu_power / 1000,
"gpu_power": emissions_data.gpu_power / 1000,
"ram_power": emissions_data.ram_power / 1000,
"cpu_energy_consumed": self._total_cpu_energy.kWh,
"gpu_energy_consumed": self._total_gpu_energy.kWh,
"ram_energy_consumed": self._total_ram_energy.kWh,
"total_energy_consumed": self._total_energy.kWh,
"co2_emissions": emissions_data.emissions,
}
Let me know if it works.
Or you could just add delta=True
in your previous code emissions_data = self._prepare_emissions_data(delta=True)
. But _prepare_emissions_data
is not supposed to be called by users.
I've tried using delta, and here are my outputs:
Initiating Emissions outside of loop and using delta:
domain | cpu_energy_consumed | gpu_energy_consumed | total_energy_consumed | co2_emissions | |
---|---|---|---|---|---|
0 | business_finance | 0.000211548 | 4.01752e-05 | 0.000495608 | 0.000133709 |
1 | computers_internet | 0.000412762 | 7.88342e-05 | 0.000969199 | 0.000127769 |
2 | education_reference | 0.000613303 | 0.00011636 | 0.00143555 | 0.000125814 |
3 | entertainment_music | 0.000829148 | 0.000148572 | 0.00187608 | 0.000118851 |
4 | family_relationships | 0.00104987 | 0.000190075 | 0.00239169 | 0.000139104 |
5 | health | 0.00124759 | 0.000227837 | 0.00285672 | 0.000125459 |
6 | politics_government | 0.00144783 | 0.000261735 | 0.0032953 | 0.000118322 |
7 | science_mathematics | 0.00165944 | 0.000301949 | 0.00379161 | 0.000133897 |
8 | society_culture | 0.00188305 | 0.000342077 | 0.00430129 | 0.000137505 |
9 | sports | 0.00211313 | 0.000380974 | 0.00480787 | 0.00013667 |
Initiating a new Emissions object at the beginning of each loop iteration:
domain | cpu_energy_consumed | gpu_energy_consumed | total_energy_consumed | co2_emissions | |
---|---|---|---|---|---|
0 | business_finance | 0.000196129 | 3.81157e-05 | 0.000464391 | 0.000125287 |
1 | computers_internet | 0.000198403 | 3.80815e-05 | 0.000467412 | 0.000126102 |
2 | education_reference | 0.00021492 | 4.25208e-05 | 0.000513729 | 0.000138598 |
3 | entertainment_music | 0.000202835 | 3.92338e-05 | 0.000478709 | 0.000129149 |
4 | family_relationships | 0.000193552 | 3.74038e-05 | 0.000456428 | 0.000123138 |
5 | health | 0.000207246 | 4.04279e-05 | 0.000492394 | 0.000132841 |
6 | politics_government | 0.000218375 | 4.21931e-05 | 0.000516227 | 0.000139271 |
7 | science_mathematics | 0.000196969 | 3.78047e-05 | 0.000464599 | 0.000125343 |
8 | society_culture | 0.000194441 | 3.77904e-05 | 0.00046263 | 0.000124812 |
9 | sports | 0.000195935 | 3.75336e-05 | 0.000462923 | 0.000124891 |
As you can see, the latter is much more consistent and does not seem to be exponentially increasing (it seems to reset emissions fairly well, but the energy consumed keeps rising).
I upgraded my codecarbon version to 2.3.0 and tried the code you posted but I get the following error:
48 self._lock.acquire()
49 self._stopped = True
---> 50 self._timer.cancel()
51 self._lock.release()
AttributeError: 'NoneType' object has no attribute 'cancel'
This also happens when I just initialise a standard EmissionsTracker
object as per the official docs:
emissions = EmissionsTracker(project_name="bert_inference", measure_power_secs=10)
for domain in tqdm(args.domains, total=len(args.domains)):
emissions.start_task()
args.domain = domain
dsets = {domain: datasets.load_from_disk(args.data_dir/domain) for domain in args.domains}
dsets = drop_dataset_splits(dsets, args)
run(dsets, args)
emissions = emissions.stop_task()
emissions.stop()
Hello,
Thanks for finding this bug : a workaround is to call super().start()
or emissions.start()
before start_task()
.
I will fix this in the next release.
Here is the working example : https://github.com/mlco2/codecarbon/blob/master/examples/task_inference.py
Great, this works. Thanks!
Description
I am trying to track emissions of a lot of fairly small loops, and I'm noticing that by initialising the EmissionsTracker at the beginning of every loop is causing significant overhead. I might've missed it in the documentation but is there a way to retain the same EmissionsTracker but effectively "reset" it so it bypasses all of the initialisation/connection with pynvml etc.? I guess a workaround could be that I just initialise a bunch of them before my loop but I don't think this is cutting down on any time.
Currently I'm just using it like