lfwa / carbontracker

Track and predict the energy consumption and carbon footprint of training deep learning models.
MIT License
352 stars 26 forks source link

list index out of range for power_avg #29

Closed agarwalmanvi closed 3 years ago

agarwalmanvi commented 3 years ago

Thanks for the nice tool! I'm trying to get this to work with GeNN but I get the following error:

Traceback (most recent call last):
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 281, in epoch_end
    self.tracker.epoch_end()
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 139, in epoch_end
    self._log_epoch_measurements()
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 162, in _log_epoch_measurements
    power_avg = np.mean(comp.power_usages[-1], axis=0)
IndexError: list index out of range

To make sure the error is not caused by GeNN, I tried to use a tracker with some plain Python code, and it still gave me the same error. I tried to identify where the error is coming from within the source code of carbontracker, but wasn't successful. Any ideas on how to fix this? I'm using Ubuntu with Intel i3.

lfwa commented 3 years ago

Hi,

We appreciate the feedback. :)

Could you provide the plain Python code that gave you the error as well as some more details of the environment that you are trying to run it in (e.g. Python version, Ubuntu version, and CPU generation)?

agarwalmanvi commented 3 years ago

Thanks for the prompt response! Here's the plain Python code:

from carbontracker.tracker import CarbonTracker

TRIALS = 100
tracker = CarbonTracker(epochs=TRIALS)

for trial in range(TRIALS):

    tracker.epoch_start()
    print("Trial: " + str(trial))
    sum = 1200 + 100
    tracker.epoch_end()

tracker.stop()

I'm running it in a conda environment with Python 3.7.7. My Ubuntu OS is version 18.04 and my CPU is from family 6 model 42.

lfwa commented 3 years ago

Your epoch durations may be too short for carbontracker to be able to collect any measurements. To verify this, could you try replacing the addition sum = 1200 + 100 with time.sleep(5) and decreasing the update_interval argument to 1:

from carbontracker.tracker import CarbonTracker
import time

TRIALS = 100
tracker = CarbonTracker(epochs=TRIALS, update_interval=1)

for trial in range(TRIALS):

    tracker.epoch_start()
    print("Trial: " + str(trial))
    time.sleep(5)
    tracker.epoch_end()

tracker.stop()

If this is the case, I will look into adding a more appropriate error message.

agarwalmanvi commented 3 years ago

That worked but now I get a different error (also to do with out of range index):

Traceback (most recent call last):
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 284, in epoch_end
    self._output_actual()
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 358, in _output_actual
    _co2eq = self._co2eq(energy)
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 381, in _co2eq
    ci = self.intensity_updater.average_carbon_intensity(pred_time_dur)
  File "/home/manvi/Documents/carbontracker/carbontracker/tracker.py", line 55, in average_carbon_intensity
    ci.carbon_intensity = self.carbon_intensities[-1].carbon_intensity
IndexError: list index out of range

Another question: the computations I'm doing with GeNN are very short. For example, I printed the duration from this line for the first epoch and it came to 0:00:00.89. This was probably why I was getting that error with my GeNN script. Is reducing the value of update_interval to a small enough value useful (0.1 for example?) in this case?

lfwa commented 3 years ago

Hi again,

I was unfortunately not able to replicate your error. I will create a separate issue for it and take a look at it in the very near future. If you can test it with a reduced update_interval on your GeNN script that would be perfect. :)

Regarding your second question: Yes, you should be able to reduce update_interval to 0 or a very small value (the argument specifies the time slept between measurements). If it is not able to gather a measurement for every epoch, it should extrapolate the latest measurement to the previous unmeasured epochs. This, however, assumes that you entire training session is long enough for any measurements to be gathered at all, i.e., if you training session is very short (a few seconds) and it is not able to gather a measurement in this period then it may fail.

agarwalmanvi commented 3 years ago

Thanks, it would be great to have a working example to understand how carbontracker can be used!

I tried my GeNN script again with update_interval set to 0 and 0.0001. Both gave the first error (the one from _log_epoch_measurements). The entire training session for the GeNN script I'm referring to (without any extra stuff like plotting or printing information) should take tens of minutes on my hardware since I'm doing thousands of trials.

lfwa commented 3 years ago

Okay, that is unfortunate. I will look into the issues with very short epoch durations, but it may take a while before we can release a new version with a fix.

If you are not interested in the predictions but just want to monitor your carbon footprint, you can treat your entire script as a single epoch, similar to this:

from carbontracker.tracker import CarbonTracker

tracker = CarbonTracker(epochs=1, epochs_before_pred=0)
tracker.epoch_start()

# Your script.

tracker.epoch_end()
tracker.stop()

We will be updating the README to be more comprehensive soon together with a potential release of version 2.0 making carbontracker easier to use and setup.

agarwalmanvi commented 3 years ago

Indeed my first goal was to simply track the carbon footprint. Thanks for your suggestion on treating the whole script like one epoch. It was able to move past the error with _log_epoch_measurements but hit the same error with carbon_intensity as described above. I will track issue #30 for a potential resolution.