Allow computing emissions and dumping results periodically for other output modes

mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.

MIT License

1.18k stars 178 forks source link

Currently, in the CSV output mode and other output modes results and emissions are only computed and persisted at the end of the run. I would rather get the results dumped to disk in the same way that for the API output mode emissions are computed and partial results persisted periodically: https://github.com/mlco2/codecarbon/blob/63c6a55fbfa101297e96e601a9eb68bbaa11d2e9/codecarbon/emissions_tracker.py#L700-L713

This way:

If a run crashes unexpectedly partial results are still saved
I can visualize the outputs during training
I can monitor emissions by reading the outputs and preempt runs in case of increased emissions, for example

I think the easiest way would be to allow configuring which output modes should appear in the if statements in the lines above, but ideally one could configure different rates for different outputs.

Hello,

Yes it could be nice to write to a CSV the same data as the API.

We have a flush() method that is more a checkpoint: it does not reset the data, it only store them to avoid loosing them if there is a crash and to give a view when training for a very long time.

See https://github.com/mlco2/codecarbon/issues/438#issuecomment-1665573581

You have a parameter on_csv_write :

update : the existing run_id row (erasing former data)
append : add a new row to CSV file (defaults)

So it seems we already have all the parts needed for what you ask. Feel free to propose a PR.

mlco2 / codecarbon

Allow computing emissions and dumping results periodically for other output modes #448