microsoft / ELL

Embedded Learning Library
https://microsoft.github.io/ELL
Other
2.29k stars 294 forks source link

image classifier - memory leak #137

Closed CalderaSrv closed 6 years ago

CalderaSrv commented 6 years ago

I have been getting out of memory errors when running a long running python script based on the "Repurposing a pretrained image classifier" tutorial. Using the gc garbage collector debugging helpers I have narrowed it down to my 'predictions = model.predict(input_data)' call.

My system: RPi3b setup per tutorial instructions except I have OpenCV 3.4.0 compiled from source. I have retrained the model on 5 classes. Edit: ELL version is GIT clone on 3/27/18

Code excerpt from classifier loop: input_data = self.prepare_image_for_model( image, self.input_shape.columns, self.input_shape.rows) predictions = model.predict(input_data)

predictions = [.5, .1, .1, .1,.2]

gc.collect() msg = 'len(gc.get_objects()) = {} '.format(len(gc.get_objects())) msg = msg + 'gc.get_count() = {} \n'.format(gc.get_count()) logging.debug(msg)

When I swap the commented 'predictions = ...' calls, I can see a steady increase in garbage objects when calling model.predict. Examples below:

Example logfile output with 'predictions = model.predict(input_data)': 2018-03-30 13:24:48,812 : len(gc.get_objects()) = 41428 gc.get_count() = (4, 0, 0) 2018-03-30 13:24:49,382 : len(gc.get_objects()) = 41482 gc.get_count() = (4, 0, 0) 2018-03-30 13:24:49,949 : len(gc.get_objects()) = 41535 gc.get_count() = (4, 0, 0) 2018-03-30 13:24:50,515 : len(gc.get_objects()) = 41588 gc.get_count() = (4, 0, 0) .....

Example logfile output with 'predictions = [.5, .1, .1, .1,.2]': 2018-03-30 13:23:40,937 : len(gc.get_objects()) = 40738 gc.get_count() = (4, 0, 0) 2018-03-30 13:23:41,131 : len(gc.get_objects()) = 40738 gc.get_count() = (4, 0, 0) 2018-03-30 13:23:41,313 : len(gc.get_objects()) = 40738 gc.get_count() = (5, 0, 0) 2018-03-30 13:23:41,492 : len(gc.get_objects()) = 40738 gc.get_count() = (5, 0, 0) .....

Happy to try helping to fix if you can point me in the right direction.

Thanks!

braca51e commented 6 years ago

I have a similar problem when I run the example script for a long period of time (more than 1 hour)

kernhanda commented 6 years ago

Thanks for bring this to our attention! We'll investigate further and update this thread with our findings. Thanks!

lovettchris commented 6 years ago

Fix is on the way, it is related to our callback system, if you build a model without source/sink nodes it will not leak.

kernhanda commented 6 years ago

This is fixed in release 2.3.2.

CalderaSrv commented 6 years ago

Awesome. Thanks!