weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.67k forks source link

Questions on Priorboxes during prediction #408

Open leonardoaraujosantos opened 7 years ago

leonardoaraujosantos commented 7 years ago

Introduction

Hi @weiliu89 considering this picture from the paper, and thinking on the prediction phase:

priorboxespred

Some of my insights (Could be wrong)

  1. Each cell of the feature map will hold a classification score for each class (including the background class) and another feature map will hold the adjustment offset of the boxes (cx,cy,w,h)
  2. The Non-Maxima supression (Detection Output layer) will filter-out boxes with low classification score and also some boxes with high intersection for a particular high score class.
  3. Each cell has 4 boxes with different aspect ratios. The cell will not "care" about the boxes it will be only an "anchor" for the position of the prior-box

detectionphase

Questions

Doing a zoom on a small activation map. cellzoom

Again thanks for the help

weiliu89 commented 7 years ago
  1. Yes. Although, there will be classification scores and bbox offset adjustment for each default box (i.e. 4 in this case).

  2. Yes.

  3. I don't quite understand the question. The net's predictions will be related to each default box.

Each box will have slightly different classification scores and offsets prediction. See the answer 3 above.

leonardoaraujosantos commented 7 years ago

Thanks @weiliu89, and sorry for the latest question if it was not that clear.

So considering the activation at conv_9_2 on the diagram the size will be 5x5x256 blockdiagrampaper

Spatially the priorboxes will be resized to fit on this 5x5 window correct? So each cell will have 4 boxes, but the boxes does not cover completly all the cells cellzoom

On this case which cells should I consider for the red box?

marksunpeng commented 7 years ago

I think it's the opposite, the red box(response of red box) come with cell 5, as priorboxes 'embedded' to each cell.

@weiliu89 am I right or wrong?

marksunpeng commented 7 years ago

I think Fig.1 on original paper is both helpful and a bit (if you think with it for forward pass) confusing.

leonardoaraujosantos commented 7 years ago

@weiliu89 check if I understand...

During forward propagation. (Prediction)

  1. Each cell will have 4 boxes. But each one of those boxes will have a score vector of all classes plus the 4 coordinates (cx,cy,width,height)
  2. The classification score of each box is learned when the Multibox-loss calculates the the jaccard distance with the Ground truth. (During training)
  3. During the prediction we will choose the box from each cell that has bigger classification score, then adjust this box accordingly.

So in resume there is no crazy calculation of the scores of each cells that are bellow a particular region (for example red box), in order to do the non maxima supression.

The non-maxima supression will work on each box of each cell (At the end of the prediction, after merging all the boxes).

So each cell will have.... Following the formula (paper page 4) (c + 4)k (Filters) c: Number of classes + Background class 4: (cx,cy,width,height) k: Number of boxes

@marksunpeng what you think?

leonardoaraujosantos commented 7 years ago

Sorry @weiliu89 could you confirm if I'm understanding the following topics?

During forward propagation. (Prediction)

  1. Each cell will have 4 boxes. But each one of those boxes will have a score vector of all classes plus the 4 coordinates (cx,cy,width,height)
  2. The classification score of each box is learned when the Multibox-loss calculates the the jaccard distance with the Ground truth. (During training)
  3. During the prediction we will choose the box from each cell that has bigger classification score, then adjust this box accordingly.

The non-maxima supression will work on each box of each cell (At the end of the prediction, after merging all the boxes).

So each cell will have.... Following the formula (paper page 4) (c + 4)k (Filters) c: Number of classes + Background class 4: (cx,cy,width,height) k: Number of boxes

Thanks again.