Number of ROIs - Githubissues

Hi, thank you for your interest!

We follow past conventions when preparing the image features for each task. For VQA, we follow Pythia 1.0 and use 100 boxes per image. For VCR, we follow R2C (the model shipped with the paper) and use provided boxes. For Flickr30K, we follow BAN (https://arxiv.org/pdf/1805.07932.pdf https://arxiv.org/pdf/1805.07932.pdf) and use 10-100 boxes with a confidence threshold.
Yes. We did this because we wanted to follow past conventions for each task and these tasks each extract image features differently. So we have to pre-train on COCO with several sets of image features. For example, for pre-training on COCO for the VQA task, we use Pythia 1.0’s detector and use 100 boxes per COCO image. For pre-training on COCO for Flickr30K, we use BAN’s way (which is BUTD features) and use 10-100 boxes per COCO image.

Hope that answers your question!

On Jul 19, 2020, at 5:11 AM, Emanuele Bugliarello notifications@github.com wrote:

Hi and thanks for the nice repo!

I couldn't find in the paper how many proposals you used for pre-training and fine-tuning in each dataset (except for NLVR, where you use 144). Also, could it be that you do the "Task-Agnostic Pre-Training" on COCO separately for each task? (Given the different config files in this repo) Thanks a lot! And congrats on the ACL follow-up paper

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uclanlp/visualbert/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAYVCSMRDKA5HZPEXIBWHLR4LPF3ANCNFSM4PBIWKTQ.

uclanlp / visualbert

Number of ROIs #11