Open Jason-fan20 opened 2 years ago
Hi, the od features are generated from the model you linked vinvl_vg_x152c4.pth. If you need the features on COCO or nocaps, you can download the pre-extracted features and labels at https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#datasets
@309018451 For the image tags, the image tags used in both NoCaps training and testing are generated from an OD model pretrained on OpenImages dataset, not from the model vinvl_vg_x152c4.pth.
We cannot release the OD model pretrained on OpenImages dataset, but you should be able to train your own OpenImages OD model or use some public OpenImages OD model.
@pzzhang @xiaoweihu Thank you so much! I'll try and make it work in my case. Again, thank you a lot 💯
I tried to reproduce the results for VinVL+SCST on NoCaps, but my result was off by a visible margin.
I generate coco features by od model with pre-trained models with vinvl_vg_x152c4.pth, when I simply check the total types of generated tags train.label.tsv, it has 1319 types. After that, I limit the total types to open images 500 labels, my train.label.tsv has a total of 400~ types and my result was also off by a visible margin. I can only get 12.1 SPICE scores after CC(Cross entropy)+CIDER OPTIM. The config file is val_vinvl_x152c4.yaml
For the given coco_caption training set, train.label.tsv has a total of 498 types. I can normally reproduce your result 12.8 SPICE.
May I ask, which model should I use to reproduce your od feature results, could you plz you give me a link to this od model (To reproduce your feature results and similarly generate them)?
If it's not available, Or I should try an od model on the visual genome and open images by myself?
Many thanks !
(´ー∀ー
)(´ー∀ー
)(´ー∀ー`)