jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

some question about the codes in ./common/faster_rcnn.py #44

Closed SnoopyMark closed 4 years ago

SnoopyMark commented 4 years ago

Hi, there. I read some of your codes and have some questions about class FastRCNN in file './common/faster_rcnn.py'. As we can see, it's forward method has args like (self, images, boxes, box_mask, im_info, classes=None, segms=None, mvrc_ops=None, mask_visual_embed=None), but isn't it Faster-RCNN's work to get the RoIs, so why is there parameters like 'boxes' and so on?

jackroos commented 4 years ago

Actually, we drop the RPN branch of Faster RCNN in our VL-BERT since how to tune it is non-trivial in our case, we use Fast RCNN in our VL-BERT instead of Faster RCNN, the difference is that Fast RCNN doesn't have RPN so we need to input the boxes precomputed by pre-trained Faster RCNN.