Open victoriamazo opened 8 years ago
You can think of it as approximating a gaussian distribution for adjusting the prior box. Or you can think of it as scaling the localization gradient. Variance is also used in original MultiBox and Fast(er) R-CNN.
I have a couple of follow up questions.
caffeset
} else {
int count = 0;
for (int h = 0; h < layer_height; ++h) {
for (int w = 0; w < layer_width; ++w) {
for (int i = 0; i < numpriors; ++i) {
for (int j = 0; j < 4; ++j) {
topdata[count] = variance[j];
++count;
}
}
}
}`Sorry, but I still don't understand. After you calculate of prior box coordinates and add the offsets to top_data
top_data += top[0]->offset(0, 1);
all top_data values are replaced with the constant variance and not multiplied by it!
top_data[count] = variance_[j];
That is the values of the top layer (top[0]->mutable_cpu_data()) are left constant equal to variance and all the previous calculation of prior box coordinates is lost!
@victoriamazo , top_data is a pointer,
Dtype* top_data = top[0]->mutable_cpu_data();
so top_data += top[0]->offset(0, 1);
changes the pointer, you are not replacing the bbox values that were calculated.
check the offset function in blob.hpp
The variances are used in function EncodeBBox and DecodeBBox. What remains unclear is why we should divide variance when calculating ground truth regression target. @weiliu89 can you kindly explain more clearly? And, I cannot find the 0.1 for *_center and 0.2 for width/height setting in the Fast-RCNN paper, nor in the fast-rcnn code. Do you know what 0.1 or 0.2 means here?
Thank you @siddharthm83 and @weiliu89!
Thanks for the explanation, while I am still a bit confused about this. It said in the comment of function Rashape in 'prior_box_layer.cpp' that
"//2 channels. First channel stores the mean of each prior coordinate. // Second channel stores the variance of each prior coordinate."
So are the xmin, xmax, ymin, ymax the 'mean' values meaning here, and the next channel will store the variances of a length of 'dim' as the code later shows?
Thank you so much!
Has anyone figured out why the division by variance?
When I am testing the assign boxes function I get negative coordinates for the assigned bounding boxes which is caused due to the encoding after the IOU step inside encode_box function.
variances are coefficients for encoding/decoding the locations of bounding boxes. The first value is used to encode/decode coordinates of the centers. The second value is used to encode/decode the sizes of bounding boxes.
Why the encoding/decoding is needed ? is it for using less parameters while training (2 instead of 4 )? for faster optimization ? less calculations ?
What is the meaning of variance in priorbox layer? Why all calculated coordinates in top_data are simply changed to const variance value? 154 topdata[count] = variance[j];