microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

The result of IR/TR from BERT base without pre-training #37

Open Howal opened 4 years ago

Howal commented 4 years ago

Hi there, nice work!

I tried to reproduce the result you provided in Table 3 of the paper, i.e., IR and TR on COCO 1K with the model initialzed from BERT base without pre-training. My results (default setting with all attentions) are far below what you reported: TR: 0.6820 @ R1, 0.9180 @ R5, 0.9620 @ R10 IR: 0.5676 @ R1, 0.8748 @ R5, 0.9466 @ R10

I followed the script, but only change the --model_name_or_path to 'bert-base-uncased'.

Did I miss some thing important or need another set of hyper-params for finetune without pre-training?

Thank you!

xiyinmsu commented 4 years ago

If you init the model from bert-base-uncased, you will need to use a larger LR.

Howal commented 4 years ago

Do you have any suggested LR?

Howal commented 4 years ago

I have tried some settings with a larger LR: 5x-lr (w/o warmup), not converge. 5x-lr (1k-steps warmup), ~4% worse then 1x-lr's result. 10x-lr (w/o warmup, 1k-steps warmup, 2k-steps warmup), all not converge.

Could you please provide any examples for init from bert-base-uncased?