symisc / sod

An Embedded Computer Vision & Machine Learning Library (CPU Optimized & IoT Capable)
https://sod.pixlab.io
Other
1.75k stars 213 forks source link

forward_convolutional_layer may have a bug? #24

Closed zjp0317 closed 4 years ago

zjp0317 commented 4 years ago

In the forward_convolutional_layer function (sod.c, line 4886), is it supposed to be " bxl.n " rather than " bxn "?

symisc commented 4 years ago

It's bx multiplied by n

zjp0317 commented 4 years ago

Thanks for your reply. Would "bx multiplied by n" cause overflow on the l.output? It looks like "bx multiplied by l.n" makes more sense..

symisc commented 4 years ago

Did you roll your own tests to make those assumptions?

zjp0317 commented 4 years ago

The index calculated for l.output on line 4886 does not match the layout of l.output: this index theoretically can be larger than the array size (line 5067).

However, the l.batch is set to 1 on line 8885. So the "bx" will be 0 during execution, triggering no overflow. This is related to another issue that the face_cnn allocate memory for 16 batches, but it will only use the memory of 1 batch during prediction

symisc commented 4 years ago

@zjp0317 Your assumption are correct. We are testing the configuration you mentioned and yes, in certain, although rare cases (never faced in real deployment), the index can grow much bigger than the allocated array. Things would be different if we used modern C++ instead of plain old C. We will issue a fix in the next version of the library. Thank you. Do you have anything to add?

zjp0317 commented 4 years ago

I'm confused about the memory allocation as mentioned above

Given that the l.batch is reset to 1 (line 8885), I'm not sure why sod allocates memory with l.batch > 1. E.g., with the face_cnn model, the l.batch on line 5044 is 16, which affects the following allocations (line 5067, 5068, etc.)

In other words, it looks like sod allocates many memory that it actually won't use during prediction, which increase unnecessary virtual memory overhead.

These allocations use calloc to allocate a relatively-large memory (larger than several MBytes), so they are typically handled by mmap. As a result, all these redundantly-allcoated memory will not consume physical memory. E.g., in my test of face_cnn, sod consumes around 800MB virtual memory, and around 70~80MB physical memory. I manually changed the batch to be 1 before allocations, then the virtual memory usage was reduced to ~80MB without compromising prediction.

This may not be a big deal though. Unless the allocation size is less than a few MBytes, those redundant virtual memory from calloc typically won't consume physical memory. But if it happens, sod would have unnecessary physical memory overhead.

symisc commented 4 years ago

You are making your assumption that the CNN layer will deal only with single class objects. This is true for the face detector which comprise a single class only (i.e. face) and certainly not the case for the VOC model which can handle up to 30 classes and the COCO model which can detect up to 60 classes of different objects. See the previous comment, if SOD was written in modern C++, memory management would be way easier than plain C.