Filling np.array very slow because of Carbontracker

lfwa / carbontracker

Track and predict the energy consumption and carbon footprint of training deep learning models.

MIT License

374 stars 27 forks source link

Filling np.array very slow because of Carbontracker #41

Closed nfurnon closed 1 year ago

nfurnon commented 3 years ago

Filling a pre-allocated array is slowed down by a factor of ~70 when using carbontracker. See minimum code below. Am I doing anything wrong ? How can we avoid this ?

import time
import numpy as np
from carbontracker.tracker import CarbonTracker

def load_data(length, data_shape):
    data = np.zeros((length, *data_shape))
    for i in range(length):
        data[i] = np.random.random(data_shape)
    return data

if __name__ == '__main__':
    l = 10000
    shape = (16000, )
    tt = time.time()
    data = load_data(l, shape)
    print(f'Without CT : {time.time() - tt} seconds')

    tracker = CarbonTracker(epochs=1, monitor_epochs=1, log_dir='./')
    tt = time.time()
    data = load_data(l, shape)
    print(f'With CT : {time.time() - tt} seconds')

lfwa commented 3 years ago

Hi nfurnon,

Thanks for your feedback, we appreciate it!

This looks like a bug that happens when CarbonTracker is instantiated and tracker.epoch_start() is not called directly afterwards. I have not yet found the problem within the code base. However, there exists a workaround by starting the tracker immediately after instantiating it, e.g.:

import time
import numpy as np
from carbontracker.tracker import CarbonTracker

def load_data(length, data_shape):
    data = np.zeros((length, *data_shape))
    for i in range(length):
        data[i] = np.random.random(data_shape)
    return data

if __name__ == '__main__':
    l = 10000
    shape = (16000, )
    tt = time.time()
    data = load_data(l, shape)
    print(f'Without CT : {time.time() - tt} seconds')

    tracker = CarbonTracker(epochs=1, monitor_epochs=1, log_dir='./')
    tracker.epoch_start()
    tt = time.time()
    data = load_data(l, shape)
    tracker.epoch_end()
    print(f'With CT : {time.time() - tt} seconds')

Let me know if this helps.

nfurnon commented 3 years ago

Thank you for your answer. The time-consuming task was instantiating a Pytorch Dataset class, so I guess I can just instantiate CarbonTracker after it. But as for doing it right before calling tracker.epoch_start(), it does not seem possible since there will be the for i_epoch in ... line before. Unless I track the whole training process as one single epoch...

lfwa commented 3 years ago

It should be fine to have a for-loop after instantiation like we show in the example in the README.md. The problem is likely when more compute-intensive operations are done without starting the tracker.

nfurnon commented 3 years ago

OK, thank you !

lfwa commented 3 years ago

Reopening this issue as a reminder that instantiating CarbonTracker and not starting the tracker using tracker.epoch_start() will slow down other code.

The issue will be closed once it is fixed.

rhosch97 commented 2 years ago

adding a time.sleep() with a small but big enough time (best value to be determined, I use a quite long time of 1ms but can be shorter I think) in the CarbonTrackerThread() (see below) solved the problem for me. probably avoids clogging up the CPU with billions of accesses to the self.measuring attribute :) I'm not sure if this solution is scalable and fault proof for the whole tool, but it could be a hint !

def run(self):
        """Thread's activity."""
        try:
            self.begin()
            while self.running:
                if not self.measuring:
                    time.sleep(0.001)
                    continue
                self._collect_measurements()
                time.sleep(self.update_interval)

            # Shutdown in thread's activity instead of epoch_end() to ensure
            # that we only shutdown after last measurement.
            self._components_shutdown()
        except Exception as e:
            self._handle_error(e)

edit: replaced screen capture with code

PedramBakh commented 1 year ago

In response to feedback about performance slowdowns due to busy-waiting in the CarbonTrackerThread(), we've implemented changes in Release v1.2.0. We've transitioned to an event-based approach, enhancing performance. Thank you for drawing our attention to this matter.