pzzhang / VinVL

project page for VinVL
349 stars 25 forks source link

how are the object tags are generated? #6

Open runzeer opened 3 years ago

runzeer commented 3 years ago

In your VQA val2014_qla_mrcnn.json file , I found the number of the object tags can not correspond to the numbers of the features in the feats pt file. So could you tell me how to generate the object tags?

pzzhang commented 3 years ago

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that you need to only keep the tags that in the COCO 80-classes vocabulary.

CCYChongyanChen commented 2 years ago

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that you need to only keep the tags that in the COCO 80-classes vocabulary.

Hi Zhang, Does the "o" represent the object tags? Does the order matter if we replace the object tags with VinVL's classes? How could we generate the "an" and "s"? image

Also, if we use VinVL image features for the Oscar model for the VQA task, why does the code still ask for the mrcnn.json file? Thank you a lot in advance!

CCYChongyanChen commented 2 years ago

Wow, I think "an" and "s" are not even used! That means all we need is the "q" and "o" (object tags)!!! That makes things much easier!

pzzhang commented 2 years ago

Yes, @CCYChongyanChen "an" and "s" are answers and scores, not used in the model but used in the evaluation.

The order does not matter, but it matters to only keep the tags that in the COCO 80-classes vocabulary.