uclanlp / visualbert

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
528 stars 104 forks source link

COCO features #22

Closed qinzzz closed 3 years ago

qinzzz commented 3 years ago

Hi! Thank you for your excellent work. I noticed that we downloaded COCO features separately for NLVR, VQA and VCR. What is the difference between the features? Are they from different models of detectron2? By the way, could you please provide the script for generating Flickr30k features?

liunian-harold-li commented 3 years ago

Yeah when I did the project, I tried to follow previous work's setting and each of theses tasks used different visual models for extracting features. My apologies for the delay, I will update the Flickr30K part soon.

bigbrother001 commented 2 years ago

I notice that the model (e2e_mask_rcnn_R-101-FPN_2x, model_id: 35861858) from Detectron is trained in COCO dataset, is it fine to use these weights to get image features for the image in nlvr2 dataset? I guess you did so.