Open joeyy5588 opened 3 years ago
can l have a contact number?
You can reach me via email.
in order to run run_captioning.py ,train.yaml is needed. train.yaml file is some required data(image feature,caption,labels),where is the train.yaml? or how to get train.yaml?
------------------ 原始邮件 ------------------ 发件人: "microsoft/Oscar" @.>; 发送时间: 2021年8月17日(星期二) 晚上9:37 @.>; @.**@.>; 主题: Re: [microsoft/Oscar] Unable to Reproduce the results for VinVL+VIVO on NoCaps (#131)
You can reach me via email.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
hello! Jennifer!!
I want to ask for one question about reproduce this model on Nocaps task.
I just see the run_captioning.py file, then I'm looking for the Class Bertforimagecaptioning(): self._decode_step and self._generate_beam_search in modeling_bert.py ... why i cannot find above these two function where they are? Could you please tell me the detailed about the inference phrase? In inference phrase, is that need a transformer block as the decoder?
Hi,
This repo is for VinVL pretraining. The VIVO pre-training loss is different from what is used in VinVL, e.g., the Hungarian matching. Hope this could help.
Best, Xiaowei
Hi,
This repo is for VinVL pretraining. The VIVO pre-training loss is different from what is used in VinVL, e.g., the Hungarian matching. Hope this could help.
Best, Xiaowei
Hi, Is the VIVO pre-training publicly accessible? I cannot find any code or pre-trained models about VIVO.
I tried to reproduce the results for VinVL+VIVO+SCST on NoCaps, but my result was off by a visible margin.
Reported Results on NoCaps validation set
"CIDEr": {"in-domain": 103.7, "near-domain": 95.6, "out-domain": 83.8, "entire": 94.3}, "SPICE": {"in-domain": 13.7, "near-domain": 13.4, "out-domain": 11.9, "entire": 13.1}}
Reproduced results:
"CIDEr": {"in-domain": 95.0, "near-domain": 91.1, "out-domain": 79.2, "entire": 89.3}, "SPICE": {"in-domain": 13.1, "near-domain": 12.9, "out-domain": 11.2, "entire": 12.6}}
Hardware Specifications
V100 32G * 8
I'll specify the data, scripts and codes I used during pretraining, cross_entropy opimization, CIDEr optimization and inference
Data
Pretraining
For OpenImages, I downloaded the feature files and label files released by this repo from the folloing links. features, labels
Cross_entropy, CIDEr Optimization
I downloaded the coco_caption dataset following the instruction from VinVL_DOWNLOAD.md
Inference
I downloaded the nocaps dataset following the instruction from VinVL_DOWNLOAD.md, which includes the feature files and label files for nocaps validation set.
Codes
Pretraining
I slightly modify the run_oscarplus_pretrain.py to perform MLM on the tag sequence.
Cross_entropy, CIDEr Optimization, Inference
I leverage run_captioning.py to perform cross_entropy optimization, CIDEr optimization on COCO, and evaluate on NoCaps validation set.
Scripts/Training args
Pretraining
batch_size=1024, lr=5e-05, maximum of 50 image regions and 15 tag tokens per image, trained for 160K iters, same masking strategy
Parameters are set according to the VIVO paper.Cross_entropy
batch_size=256, lr=5e-05, trained for 30 epochs
These parameters are set according to the VIVO paper, other parameters remain the default value in run_captioning.py.CIDEr Optimization
batch_size=14*8=112, lr=5e-6, trained for 40 epochs
I found that the training scripts from VinVL_DOWNLOAD.md can yield better performance than the setting in VIVO paper, so I used the scripts to perform CIDEr Optimization.Inference
max_gen_length=20, num_beams=5
I'm wondering if I made mistakes during these steps. @pzzhang Really appreciate if you could clarify this, thanks!