microsoft / Oscar

Oscar and VinVL
MIT License
1.03k stars 247 forks source link

Unable to Reproduce the results for VinVL+VIVO on NoCaps #131

Open joeyy5588 opened 2 years ago

joeyy5588 commented 2 years ago

I tried to reproduce the results for VinVL+VIVO+SCST on NoCaps, but my result was off by a visible margin.

Reported Results on NoCaps validation set

"CIDEr": {"in-domain": 103.7, "near-domain": 95.6, "out-domain": 83.8, "entire": 94.3}, "SPICE": {"in-domain": 13.7, "near-domain": 13.4, "out-domain": 11.9, "entire": 13.1}}

Reproduced results:

"CIDEr": {"in-domain": 95.0, "near-domain": 91.1, "out-domain": 79.2, "entire": 89.3}, "SPICE": {"in-domain": 13.1, "near-domain": 12.9, "out-domain": 11.2, "entire": 12.6}}

Hardware Specifications

V100 32G * 8

I'll specify the data, scripts and codes I used during pretraining, cross_entropy opimization, CIDEr optimization and inference

Data

Pretraining

For OpenImages, I downloaded the feature files and label files released by this repo from the folloing links. features, labels

Cross_entropy, CIDEr Optimization

I downloaded the coco_caption dataset following the instruction from VinVL_DOWNLOAD.md

Inference

I downloaded the nocaps dataset following the instruction from VinVL_DOWNLOAD.md, which includes the feature files and label files for nocaps validation set.

Codes

Pretraining

I slightly modify the run_oscarplus_pretrain.py to perform MLM on the tag sequence.

Cross_entropy, CIDEr Optimization, Inference

I leverage run_captioning.py to perform cross_entropy optimization, CIDEr optimization on COCO, and evaluate on NoCaps validation set.

Scripts/Training args

Pretraining

batch_size=1024, lr=5e-05, maximum of 50 image regions and 15 tag tokens per image, trained for 160K iters, same masking strategy Parameters are set according to the VIVO paper.

Cross_entropy

batch_size=256, lr=5e-05, trained for 30 epochs These parameters are set according to the VIVO paper, other parameters remain the default value in run_captioning.py.

CIDEr Optimization

batch_size=14*8=112, lr=5e-6, trained for 40 epochs I found that the training scripts from VinVL_DOWNLOAD.md can yield better performance than the setting in VIVO paper, so I used the scripts to perform CIDEr Optimization.

Inference

max_gen_length=20, num_beams=5

I'm wondering if I made mistakes during these steps. @pzzhang Really appreciate if you could clarify this, thanks!

Jennifer-6 commented 2 years ago

can l have a contact number?

joeyy5588 commented 2 years ago

You can reach me via email.

Jennifer-6 commented 2 years ago

in order to  run run_captioning.py  ,train.yaml is needed. train.yaml file is some required data(image feature,caption,labels),where is the train.yaml? or how to get train.yaml?

------------------ 原始邮件 ------------------ 发件人: "microsoft/Oscar" @.>; 发送时间: 2021年8月17日(星期二) 晚上9:37 @.>; @.**@.>; 主题: Re: [microsoft/Oscar] Unable to Reproduce the results for VinVL+VIVO on NoCaps (#131)

You can reach me via email.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

wangying1586 commented 2 years ago

hello! Jennifer!!

I want to ask for one question about reproduce this model on Nocaps task.

I just see the run_captioning.py file, then I'm looking for the Class Bertforimagecaptioning(): self._decode_step and self._generate_beam_search in modeling_bert.py ... why i cannot find above these two function where they are? Could you please tell me the detailed about the inference phrase? In inference phrase, is that need a transformer block as the decoder?

xiaoweihu commented 2 years ago

Hi,

This repo is for VinVL pretraining. The VIVO pre-training loss is different from what is used in VinVL, e.g., the Hungarian matching. Hope this could help.

Best, Xiaowei

lhlclhl commented 2 years ago

Hi,

This repo is for VinVL pretraining. The VIVO pre-training loss is different from what is used in VinVL, e.g., the Hungarian matching. Hope this could help.

Best, Xiaowei

Hi, Is the VIVO pre-training publicly accessible? I cannot find any code or pre-trained models about VIVO.