lfwa / carbontracker

Track and predict the energy consumption and carbon footprint of training deep learning models.
MIT License
374 stars 27 forks source link

Out of index issue and 'os' has no attribute 'EX_SOFTWARE' #33

Closed Princec711 closed 4 years ago

Princec711 commented 4 years ago

Congratulations for great work.

Issue 1 : Actually I tried below test code in jupyter notebook and getting below error.

 from carbontracker.tracker import CarbonTracker
 import time

 TRIALS = 100
 tracker = CarbonTracker(epochs=TRIALS, update_interval=1)

 for trial in range(TRIALS):

      tracker.epoch_start()
      print("Trial: " + str(trial))
      time.sleep(5)
      tracker.epoch_end()

 tracker.stop()

Issue 2 --> Also faced same issue/ Error while running deep learning image classification model.

Hoping to hear from you soon.Thanks in advance

Error :

Trial: 0 CarbonTracker: The following components were found: GPU with device(s) GeForce GTX 1050 Ti. Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 274, in epoch_end self.tracker.epoch_end() File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 139, in epoch_end self._log_epoch_measurements() File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 163, in _log_epoch_measurements power_avg = np.mean(comp.power_usages[-2], axis=0) IndexError: list index out of range

IndexError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py in epoch_end(self) 273 try: --> 274 self.tracker.epoch_end() 275

C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py in epoch_end(self) 138 self.epoch_times.append(time.time() - self.cur_epoch_time) --> 139 self._log_epoch_measurements() 140

C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py in _log_epoch_measurements(self) 162 if np.isnan(power_avg).all(): --> 163 power_avg = np.mean(comp.power_usages[-2], axis=0) 164 self.logger.info(

IndexError: list index out of range

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last) in 10 print("Trial: " + str(trial)) 11 time.sleep(5) ---> 12 tracker.epoch_end() 13 14 tracker.stop()

C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py in epoch_end(self) 285 self._delete() 286 except Exception as e: --> 287 self._handle_error(e) 288 289 def stop(self):

C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py in _handle_error(self, error) 326 self._delete() 327 else: --> 328 sys.exit(os.EX_SOFTWARE) 329 330 def _output_energy(self, description, time, energy, co2eq, conversions):

AttributeError: module 'os' has no attribute 'EX_SOFTWARE'

Princec711 commented 4 years ago

Actually I executed above code in Jupyter notebook (in windows).The issue 'os' has no attribute 'EX_SOFTWARE' can be solved by replacing the os.EX_SOFTWARE with either 0 or 70.

Can you please help me with out of index issue.Thanks in advance

lfwa commented 4 years ago

Hi Princec,

Sorry for the delayed response. This error may be related to #29. I will leave this issue up for now and do some further testing to see if this is the case. Hopefully, a fix should be out by this week. I will make sure to update the issues.

Princec711 commented 4 years ago

Thanks for quick reply.

Actually when I am running the above code in google colab its working fine.But while running in the my local machine (windows with GPU) in Jupyter notebook i am getting error.

Please let me know in case of any action or any changes required from my side.

I have one more question while using it for image classification model

  for epoch in range(10):
        tracker.epoch_start()

        model.fit(train_images, train_labels, epochs=epoch)

        tracker.epoch_end() 

Please let me know if this format is correct ?

Hoping to hear from you soon.

kanding commented 4 years ago

Hello Princec!

Yes - that is exactly the correct format :)

Princec711 commented 4 years ago

Hello Kanding,

Thank you so much for help.

But one question here for below case

if i want to see output after epoch 4 then i am getting output like this

tracker = CarbonTracker(epochs=10, epochs_before_pred=4, monitor_epochs=4, update_interval=1,stop_and_confirm=True)

1875/1875 [==============================] - 3s 2ms/step - loss: 1.7441 - accuracy: 0.7232 Epoch 1/2 1875/1875 [==============================] - 3s 2ms/step - loss: 1.7167 - accuracy: 0.7515 Epoch 2/2 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6734 - accuracy: 0.7894 Epoch 1/3 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6934 - accuracy: 0.7760 Epoch 2/3 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6214 - accuracy: 0.8423 Epoch 3/3 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6070 - accuracy: 0.8558

CarbonTracker: Actual consumption for 4 epoch(s): Time: 0:00:21 Energy: 0.000254 kWh CO2eq: 0.074713 g This is equivalent to: 0.000621 km travelled by car CarbonTracker: Predicted consumption for 10 epoch(s): Time: 0:00:52 Energy: 0.000635 kWh CO2eq: 0.186782 g This is equivalent to: 0.001551 km travelled by car

But i feel output should be like below format

Epoch 1/3 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6934 - accuracy: 0.7760 Epoch 2/3 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6214 - accuracy: 0.8423 Epoch 3/3 1875/1875 [==============================] - 3s 2ms/step - loss: 1.6070 - accuracy: 0.8558

Followed by other outputs.

Correct me if my understanding is wrong. Again Thank you for help

lfwa commented 4 years ago

Hi again,

What framework are you using for your image classification model?

This may be an error with your for-loop and the way model.fit() is supposed to be used. In your case, it could be that model.fit(epochs=10) may only need to be executed once to train your model for 10 epochs. The way you have set it up now seems like it would repeatedly train your model over and over again for an increasing amount of epochs, i.e., in the first iteration it trains your model for a single epoch, the next iteration it trains for 2 epochs (while "erasing" the previous training), and then for 3 epochs, etc. If this is the case then our prediction feature may not be compatible with the framework that you are using. However, if you simply want to track your carbon footprint then you could do something like:

from carbontracker import CarbonTracker

tracker = CarbonTracker(epochs=1, epochs_before_pred=0)

tracker.epoch_start()

model.fit(train_images, train_labels, epochs=3)

tracker.epoch_end()

The output should be something like:

Epoch 1/3
1875/1875 [==============================] - 3s 2ms/step - loss: 1.6934 - accuracy: 0.7760
Epoch 2/3
1875/1875 [==============================] - 3s 2ms/step - loss: 1.6214 - accuracy: 0.8423
Epoch 3/3
1875/1875 [==============================] - 3s 2ms/step - loss: 1.6070 - accuracy: 0.8558

CarbonTracker: 
Actual consumption for 1 epoch(s):
        Time:   0:00:10
        Energy: 0.000041 kWh
        CO2eq:  0.003357 g
        This is equivalent to:
        0.000028 km travelled by car

This treats your entire training session as a single "epoch" or iteration, while actually measuring the entire consumption (in this case the 3 epochs).

Hope this helps.

Princec711 commented 4 years ago

Hello,

I am using tensorflow framework.

It worked for me with below code

tracker = CarbonTracker(epochs=20, epochs_before_pred=1, monitor_epochs=1,stop_and_confirm=True)
tracker.epoch_start()

model = keras.Sequential([keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=3)

tracker.epoch_end()

But when i am trying to change the value of epochs_before_pred and monitor_epochs to some other values (for example 2) then output is not generated.

For Below case its working. Need something similar like below output.

from carbontracker.tracker import CarbonTracker
import time

TRIALS = 17
tracker = CarbonTracker(epochs=TRIALS, epochs_before_pred=13, monitor_epochs=13,
                        update_interval=1,stop_and_confirm=True)

for trial in range(TRIALS):

    tracker.epoch_start()
    print("Trial: " + str(trial))
    time.sleep(5)
    tracker.epoch_end()

tracker.stop()
Output :
Trial: 0
Trial: 1
Trial: 2
Trial: 3
Trial: 4
Trial: 5
Trial: 6
Trial: 7
Trial: 8
Trial: 9
Trial: 10
Trial: 11
Trial: 12
CarbonTracker: 
Actual consumption for 13 epoch(s):
    Time:   0:01:05
    Energy: 0.000298 kWh
    CO2eq:  0.087687 g
    This is equivalent to:
    0.000728 km travelled by car
CarbonTracker: 
Predicted consumption for 17 epoch(s):
    Time:   0:01:25
    Energy: 0.000390 kWh
    CO2eq:  0.114667 g
    This is equivalent to:
    0.000952 km travelled by car
CarbonTracker: Continue training (y/n)?

Please let me know in case of any clarifications needed from my side.

Hoping to hear from you soon

lfwa commented 4 years ago

Hi,

Since you are using model.fit(epochs=EPOCHS) instead of writing your own training loop then you cannot use CarbonTracker with more than epochs=1 as per my previous comment.

You can instead write your own training loop from scratch (see tensorflow.org/guide/keras/writing_a_training_loop_from_scratch) and then use the basic setup (see README.md) as you tried previously.

Princec711 commented 4 years ago

Hello,

Thank you so much for help. I will try that.

Can we try to write logic for model.fit(epochs=EPOCHS) since most of times we use this only.(One of Enhancement).

Please also check out of index issue in jupyter notebook(in windows machine with GPU) since i am able run same code in Google colab but getting out of index issue in local.

Thanks and Regards, Prince Chaturvedi

lfwa commented 4 years ago

It may not be possible to add compatibility with non-loop structures like model.fit() for the prediction feature. You can create a separate issue labelled as a feature request/enhancement if you would like this in future versions of the tool.

Thanks for your feedback. We appreciate it. :)

Princec711 commented 4 years ago

Sure Thanks i will create separate issue with label feature request/enhancement.

Thanks

Princec711 commented 4 years ago

I have tried with new version of code. Still i am facing same issue

Trial: 0
CarbonTracker: The following components were found: GPU with device(s) GeForce GTX 1050 Ti.
CarbonTracker: CRITICAL - Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 302, in epoch_end
    self.tracker.epoch_end()
  File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 160, in epoch_end
    self._log_epoch_measurements()
  File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 185, in _log_epoch_measurements
    power_avg = np.mean(comp.power_usages[-2], axis=0)
IndexError: list index out of range

CarbonTracker: Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 302, in epoch_end
    self.tracker.epoch_end()
  File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 160, in epoch_end
    self._log_epoch_measurements()
  File "C:\ProgramData\Anaconda3\lib\site-packages\carbontracker\tracker.py", line 185, in _log_epoch_measurements
    power_avg = np.mean(comp.power_usages[-2], axis=0)
IndexError: list index out of range

An exception has occurred, use %tb to see the full traceback.

SystemExit: 70

Please help me for same. Thanks

lfwa commented 4 years ago

Sorry about that. It looks like my last fix didn't quite do the trick. The latest commit 02a4a73 should fix this issue.

You can test it by installing the package through the git repository:

  1. Remove current version of carbontracker: pip uninstall carbontracker
  2. Install from git (this command may require git to be installed): pip install git+https://github.com/lfwa/carbontracker.git#egg=carbontracker
Princec711 commented 4 years ago

Hello Again,

Its running but giving zero value for everything

I tried below test code mentioned by you.

from carbontracker.tracker import CarbonTracker
import time

TRIALS = 100
tracker = CarbonTracker(epochs=TRIALS, update_interval=1)

for trial in range(TRIALS):

    tracker.epoch_start()
    print("Trial: " + str(trial))
    time.sleep(5)
    tracker.epoch_end()

tracker.stop()

Output of code


Trial: 0
CarbonTracker: The following components were found: GPU with device(s) GeForce GTX 1050 Ti.
CarbonTracker: 
Actual consumption for 1 epoch(s):
    Time:   0:00:05
    Energy: 0.000000 kWh
    CO2eq:  0.000000 g
    This is equivalent to:
    0.000000 km travelled by car
CarbonTracker: 
Predicted consumption for 100 epoch(s):
    Time:   0:08:20
    Energy: 0.000000 kWh
    CO2eq:  0.000000 g
    This is equivalent to:
    0.000000 km travelled by car
CarbonTracker: Finished monitoring.
Trial: 1
Trial: 2 so on...

Please let me know if you need more information from my side.

lfwa commented 4 years ago

Could you try again and supply the argument log_dir="./logs/" to the CarbonTracker class and show me the output of the ..._carbontracker.log file?

You can replace the "./logs/" with where you want the log files to be stored and 5 trials should be sufficient.

We really appreciate the feedback!

lfwa commented 4 years ago

Hi again Princec,

There should be two log files. Could you also send the other one?

Princec711 commented 4 years ago

Hello Sorry

The updated output is


2020-08-24 19:26:04 - carbontracker version 1.1.3
2020-08-24 19:26:04 - Only predicted and actual consumptions are multiplied by a PUE coefficient of 1.67 (Rhonda Ascierto, 2019, Uptime Institute Global Data Center Survey).
2020-08-24 19:26:04 - The following components were found: GPU with device(s) GeForce GTX 1050 Ti.
2020-08-24 19:26:04 - Monitoring thread started.
2020-08-24 19:26:09 - Epoch 1:
2020-08-24 19:26:09 - Duration: 0:00:05.00
2020-08-24 19:26:09 - Average power usage (W) for gpu: None
2020-08-24 19:26:11 - Monitoring thread ended.
lfwa commented 4 years ago

I am not sure what is causing this behavior. What operating system and Python version are you using?

Could you also try running these code snippets and show me the output and log files?

import pynvml

pynvml.nvmlInit()

device_indices = range(pynvml.nvmlDeviceGetCount())
handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in device_indices]

for handle in handles:
    name = pynvml.nvmlDeviceGetName(handle)
    device = name.decode("utf-8")
    power_usage = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000
    print(f"{device} uses {power_usage} W")

pynvml.nvmlShutdown()

and

from carbontracker.tracker import CarbonTracker
import time

max_epochs = 3

tracker = CarbonTracker(epochs=3,
                        epochs_before_pred=2,
                        monitor_epochs=-1,
                        verbose=5,
                        log_dir="./logs/")

for epoch in range(max_epochs):
    print(f"Epoch {epoch} started")
    tracker.epoch_start()

    time.sleep(2)

    tracker.epoch_end()
    print(f"Epoch {epoch} ended")

tracker.stop()
Princec711 commented 4 years ago

I have Python 3.7.4 version and Windows 10.

import pynvml

pynvml.nvmlInit()

device_indices = range(pynvml.nvmlDeviceGetCount())
handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in device_indices]

for handle in handles:
    name = pynvml.nvmlDeviceGetName(handle)
    device = name.decode("utf-8")
    power_usage = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000
    print(f"{device} uses {power_usage} W")

pynvml.nvmlShutdown()

---------------------------------------------------------------------------
NVMLError_NotSupported                    Traceback (most recent call last)
<ipython-input-2-7e19c443106e> in <module>
      9     name = pynvml.nvmlDeviceGetName(handle)
     10     device = name.decode("utf-8")
---> 11     power_usage = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000
     12     print(f"{device} uses {power_usage} W")
     13 

C:\ProgramData\Anaconda3\lib\site-packages\pynvml\nvml.py in nvmlDeviceGetPowerUsage(handle)
   1243     fn = get_func_pointer("nvmlDeviceGetPowerUsage")
   1244     ret = fn(handle, byref(c_mWatts))
-> 1245     check_return(ret)
   1246     return c_mWatts.value
   1247 

C:\ProgramData\Anaconda3\lib\site-packages\pynvml\nvml.py in check_return(ret)
    364 def check_return(ret):
    365     if (ret != NVML_SUCCESS):
--> 366         raise NVMLError(ret)
    367     return ret
    368 

NVMLError_NotSupported: Not Supported
lfwa commented 4 years ago

It looks like NVML (we rely on pynvml) does not support the retrieval of power measurements on your specific GPU model. Unfortunately, this means that carbontracker cannot be used on your system. I will create another issue to add a descriptive error message when this is the case.

I apologize that we did not see this sooner.

Princec711 commented 4 years ago

Thanks for information. By Any way i can use the tool in my system?? Please let me know if it is possible.

lfwa commented 4 years ago

At the moment there are no workarounds to retrieve power measurements from your GPU, so unfortunately it is not possible. This is also unlikely to change unless NVIDIA introduces support for your specific GPU model in NVML.