Hello. Thanks for the great job!
In the code of AttentionRPN, I notice that the out_channel of bbox_head for classification task is 2 instead of the total number of classes(21 in VOC). It means that the bbox_head only need to identify the foreground and background of ROI feature, instead of predicting the class of ROI feature.
I would like to ask the reason for this design. Are there experiments show that the 2 out_channel design is better than the 21 out_channel design?
Hello. Thanks for the great job! In the code of AttentionRPN, I notice that the out_channel of bbox_head for classification task is 2 instead of the total number of classes(21 in VOC). It means that the bbox_head only need to identify the foreground and background of ROI feature, instead of predicting the class of ROI feature. I would like to ask the reason for this design. Are there experiments show that the 2 out_channel design is better than the 21 out_channel design?