I am trying to reproduce your results for NoCaps challenge.
For the VinVL + VIVO model (NoCaps challenge) which object labels (tags) did you use for VIVO pretraining and COCO finetuning? Are tags from an object detector trained on Open Images dataset or from the same model which is used for feature extraction (trained on COCO + Objects365 + OpenImagesV5 + Visual Genome)?
Hi,
I am trying to reproduce your results for NoCaps challenge.
For the VinVL + VIVO model (NoCaps challenge) which object labels (tags) did you use for VIVO pretraining and COCO finetuning? Are tags from an object detector trained on Open Images dataset or from the same model which is used for feature extraction (trained on COCO + Objects365 + OpenImagesV5 + Visual Genome)?
Thanks.