jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

where can I find the category outputs for bottom-up features? #38

Closed homelifes closed 4 years ago

homelifes commented 4 years ago

Hello. Thanks for sharing your code I downloaded trainval2014_resnet101_faster_rcnn_genome file from the google drive link. I could only find image_id, image_w, image_h, num_boxes, boxes, features inside it. But I can't find the predicted category for each box, in which you use in your masked ROI classification. May you tell me where are these located?

jackroos commented 4 years ago

The trainval2014_resnet101_faster_rcnn_genome is for VQA fine-tuning, not for pre-training. So we don't need the masked RoI classification. Our pre-training is conducted on Conceptual Captions Dataset, together with text-only corpus.

homelifes commented 4 years ago

Sorry, missed that! Thanks a lot for your reply