Closed zjp0317 closed 4 years ago
It's bx multiplied by n
Thanks for your reply. Would "bx multiplied by n" cause overflow on the l.output? It looks like "bx multiplied by l.n" makes more sense..
Did you roll your own tests to make those assumptions?
The index calculated for l.output on line 4886 does not match the layout of l.output: this index theoretically can be larger than the array size (line 5067).
However, the l.batch is set to 1 on line 8885. So the "bx" will be 0 during execution, triggering no overflow. This is related to another issue that the face_cnn allocate memory for 16 batches, but it will only use the memory of 1 batch during prediction
@zjp0317 Your assumption are correct. We are testing the configuration you mentioned and yes, in certain, although rare cases (never faced in real deployment), the index can grow much bigger than the allocated array. Things would be different if we used modern C++ instead of plain old C. We will issue a fix in the next version of the library. Thank you. Do you have anything to add?
I'm confused about the memory allocation as mentioned above
Given that the l.batch is reset to 1 (line 8885), I'm not sure why sod allocates memory with l.batch > 1. E.g., with the face_cnn model, the l.batch on line 5044 is 16, which affects the following allocations (line 5067, 5068, etc.)
In other words, it looks like sod allocates many memory that it actually won't use during prediction, which increase unnecessary virtual memory overhead.
These allocations use calloc to allocate a relatively-large memory (larger than several MBytes), so they are typically handled by mmap. As a result, all these redundantly-allcoated memory will not consume physical memory. E.g., in my test of face_cnn, sod consumes around 800MB virtual memory, and around 70~80MB physical memory. I manually changed the batch to be 1 before allocations, then the virtual memory usage was reduced to ~80MB without compromising prediction.
This may not be a big deal though. Unless the allocation size is less than a few MBytes, those redundant virtual memory from calloc typically won't consume physical memory. But if it happens, sod would have unnecessary physical memory overhead.
You are making your assumption that the CNN layer will deal only with single class objects. This is true for the face detector which comprise a single class only (i.e. face) and certainly not the case for the VOC model which can handle up to 30 classes and the COCO model which can detect up to 60 classes of different objects. See the previous comment, if SOD was written in modern C++, memory management would be way easier than plain C.
In the forward_convolutional_layer function (sod.c, line 4886), is it supposed to be " bxl.n " rather than " bxn "?