variance in priorbox layer

weiliu89 / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

4.77k stars 1.67k forks source link

variance in priorbox layer #155

Open victoriamazo opened 8 years ago

victoriamazo commented 8 years ago

What is the meaning of variance in priorbox layer? Why all calculated coordinates in top_data are simply changed to const variance value? 154 topdata[count] = variance[j];

weiliu89 commented 8 years ago

You can think of it as approximating a gaussian distribution for adjusting the prior box. Or you can think of it as scaling the localization gradient. Variance is also used in original MultiBox and Fast(er) R-CNN.

siddharthm83 commented 8 years ago

I have a couple of follow up questions.

The first question is specific to the implementation of variance. It appears from the code that xmin, ymin, xmax, ymax are replaced with the variance. This is after topdata is calculated (so, its calculated and then replaced?) `if (variance.size() == 1) { caffeset(dim, Dtype(variance[0]), top_data); } else { int count = 0; for (int h = 0; h < layer_height; ++h) { for (int w = 0; w < layer_width; ++w) { for (int i = 0; i < numpriors; ++i) { for (int j = 0; j < 4; ++j) { topdata[count] = variance[j]; ++count; } } } }`
The second question is on choice of settings for variance. Why is the variance for xmin, ymin = 0.1 and xmax, ymax = 0.2?

weiliu89 commented 8 years ago

Check out this.
I use the [x_center, y_center, width, height] code type (same as RPN in Faster R-CNN). For convenience, they are represented as [xmin, ymin, xmax, ymax]. 0.1 for *_center and 0.2 for width/height are taken from Faster R-CNN. Check out this for more details.

victoriamazo commented 8 years ago

Sorry, but I still don't understand. After you calculate of prior box coordinates and add the offsets to top_data top_data += top[0]->offset(0, 1); all top_data values are replaced with the constant variance and not multiplied by it! top_data[count] = variance_[j]; That is the values of the top layer (top[0]->mutable_cpu_data()) are left constant equal to variance and all the previous calculation of prior box coordinates is lost!

siddharthm83 commented 8 years ago

@victoriamazo , top_data is a pointer, Dtype* top_data = top[0]->mutable_cpu_data(); so top_data += top[0]->offset(0, 1); changes the pointer, you are not replacing the bbox values that were calculated. check the offset function in blob.hpp

UKeyboard commented 8 years ago

The variances are used in function EncodeBBox and DecodeBBox. What remains unclear is why we should divide variance when calculating ground truth regression target. @weiliu89 can you kindly explain more clearly? And, I cannot find the 0.1 for *_center and 0.2 for width/height setting in the Fast-RCNN paper, nor in the fast-rcnn code. Do you know what 0.1 or 0.2 means here?

victoriamazo commented 8 years ago

Thank you @siddharthm83 and @weiliu89!

kuonangzhe commented 7 years ago

Thanks for the explanation, while I am still a bit confused about this. It said in the comment of function Rashape in 'prior_box_layer.cpp' that

"//2 channels. First channel stores the mean of each prior coordinate. // Second channel stores the variance of each prior coordinate."

So are the xmin, xmax, ymin, ymax the 'mean' values meaning here, and the next channel will store the variances of a length of 'dim' as the code later shows?

Thank you so much!

sfraczek commented 7 years ago

Has anyone figured out why the division by variance?

stavBodik commented 6 years ago

When I am testing the assign boxes function I get negative coordinates for the assigned bounding boxes which is caused due to the encoding after the IOU step inside encode_box function.

variances are coefficients for encoding/decoding the locations of bounding boxes. The first value is used to encode/decode coordinates of the centers. The second value is used to encode/decode the sizes of bounding boxes.

Why the encoding/decoding is needed ? is it for using less parameters while training (2 instead of 4 )? for faster optimization ? less calculations ?