Closed qinzzz closed 3 years ago
Yeah when I did the project, I tried to follow previous work's setting and each of theses tasks used different visual models for extracting features. My apologies for the delay, I will update the Flickr30K part soon.
I notice that the model (e2e_mask_rcnn_R-101-FPN_2x, model_id: 35861858) from Detectron is trained in COCO dataset, is it fine to use these weights to get image features for the image in nlvr2 dataset? I guess you did so.
Hi! Thank you for your excellent work. I noticed that we downloaded COCO features separately for NLVR, VQA and VCR. What is the difference between the features? Are they from different models of detectron2? By the way, could you please provide the script for generating Flickr30k features?