Do you only use bbox or embedding of predicted classes to train a VQA model?

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

GNU General Public License v3.0

754 stars 181 forks source link

Do you only use bbox or embedding of predicted classes to train a VQA model? #15

Closed cengzy14 closed 6 years ago

cengzy14 commented 6 years ago

The image_id, image_h, image_w, num_boxes, boxes, features were extracted and saved. However, it seems that only features are used to present the image. Do you use the embedding of the predicted classes or bbox to train a VQA model?

ZhuFengdaaa commented 6 years ago

No, referring to this paper, feature exacted by Faster RCNN is used as hard attention of spatial visual feature.

hengyuan-hu commented 6 years ago

no we don't use the embedding of predicted classes. In fact we tried to use that but obtained no improvement. Of course we use detected image features.