VINVL code on the VizWiz datasets

runzeer commented 3 years ago

Have you ever run the VINVL code on the VizWiz datasets? I extracted the img feats and generate the object tags with the same model weights. But the evaluation results remain same with the training process continues. Is is due to the object tags? Have you ever met this issue on other datasets?

CCYChongyanChen commented 3 years ago

Hi Runze, Could I ask how did you generate the trainval_ans2label.pkl file for the VizWiz dataset? Thank you!

CCYChongyanChen commented 2 years ago

Hi Runze, Could I ask if your object tags are generated from VinVL or Mask-RCNN? I run the VinVL image features extraction and the output is predictions.tsv. But according to @xjli in https://github.com/microsoft/Oscar/issues/13#issuecomment-645809973_ , they are using Mask R-CNN for fine-tuning VQA. See: There are two kinds of corpus in the work, pre-training corpus and downstream task finetuning corpus, both are using tag sets. For finetuning VQA, we observed that the best tag set is from COCO, not VG tag set (80 categories), because Faster R-CNN (pretrained on VG corpus), its tag prediction precision is not good enough, though VG tag set has more categories (1600). Here we use a high-precision Mask R-CNN (trained on COCO tag set) to generate the tags. For pre-training corpus, we use VG tag set from Faster R-CNN. For downstream task finetuning corpus, you can use any off-the-shelf object detector to generate the tags, it is a trade-off on high precision or more categories. And one more example is NoCaps finetuning corpus, we use OpenImage tag set, which is better on NoCaps task.

Also, I still have no idea how to generate the img_frcnn_feats.pt and trainval_label2ans.pkl. Any help is appreciated. Thanks!

microsoft / Oscar

VINVL code on the VizWiz datasets #84