microsoft / Oscar

Oscar and VinVL
MIT License
1.03k stars 248 forks source link

About od model to generate coco_caption features. I cannot reproduce your feature results. #181

Open Jason-fan20 opened 2 years ago

Jason-fan20 commented 2 years ago

I tried to reproduce the results for VinVL+SCST on NoCaps, but my result was off by a visible margin.

I generate coco features by od model with pre-trained models with vinvl_vg_x152c4.pth, when I simply check the total types of generated tags train.label.tsv, it has 1319 types. After that, I limit the total types to open images 500 labels, my train.label.tsv has a total of 400~ types and my result was also off by a visible margin. I can only get 12.1 SPICE scores after CC(Cross entropy)+CIDER OPTIM. The config file is val_vinvl_x152c4.yaml

For the given coco_caption training set, train.label.tsv has a total of 498 types. I can normally reproduce your result 12.8 SPICE.

May I ask, which model should I use to reproduce your od feature results, could you plz you give me a link to this od model (To reproduce your feature results and similarly generate them)?

If it's not available, Or I should try an od model on the visual genome and open images by myself?

Many thanks !

(´ー∀ー)(´ー∀ー)(´ー∀ー`)

xiaoweihu commented 2 years ago

Hi, the od features are generated from the model you linked vinvl_vg_x152c4.pth. If you need the features on COCO or nocaps, you can download the pre-extracted features and labels at https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#datasets

pzzhang commented 2 years ago

@309018451 For the image tags, the image tags used in both NoCaps training and testing are generated from an OD model pretrained on OpenImages dataset, not from the model vinvl_vg_x152c4.pth.

We cannot release the OD model pretrained on OpenImages dataset, but you should be able to train your own OpenImages OD model or use some public OpenImages OD model.

Jason-fan20 commented 2 years ago

@pzzhang @xiaoweihu Thank you so much! I'll try and make it work in my case. Again, thank you a lot 💯