Open Walid-Ahmed opened 7 years ago
You need to consider two different scale of 1x1 aspect ratio. then you can get 5776 from 38x38, 2166 from 19x19, 600 from 10x10, 150 from 5x5, 36 from 3x3 and 4 from 1x1, totally 8732
@ByeonghakYim Thanks a lot. when you mention 38x38, is this the grid size? and if so, does this mean that the 8 by 8 grid and 4 by 4 grid mentioned in paper are only examples and in real implementation 38x38 , 19x19, 5x5, 3x3 and 1x1 grids where used? I am sorry if I might be actually missing how it really works!
Walid
38x38 is the grid size. 8x8 and 4x4 in Figure 1 is only for illustration purpose.
@ByeonghakYim, @weiliu89 thank you for the clarification. Why do the 38x38, 3x3, and 1x1 feature maps only have 4 anchor boxes per feature map cell, when the paper implies that all layers should have 6?
@villanuevab we have the similar question at https://github.com/weiliu89/caffe/issues/316.
@wk910930 yes, the reasoning in that answer (given by @weiliu89):
conv4_3 is much larger than other layers, using 4 on conv4_3 is to avoid having too many default bboxes
makes sense for conv4_3, since it is the largest feature map used for prediction i.e., would have many default bboxes. But what about for the 3x3 and 1x1 feature maps? Perhaps at this scale it does not make sense to have too many default bboxes either, since the features would be of such high dimension that the extra 2 aspect ratios would make minimal difference i.e., not add much in terms of capturing additional features.
@weiliu89 is this intuition correct?
How was the number of object per class in a single image equals 8732 ? I understand we have 4 aspect ratio in 8 by 8 grid and 4 aspect ratio in 4 by 4 grid
So I calculated the number as 8x8x4+4x4x4=736
Wali