Open Howal opened 4 years ago
If you init the model from bert-base-uncased, you will need to use a larger LR.
Do you have any suggested LR?
I have tried some settings with a larger LR: 5x-lr (w/o warmup), not converge. 5x-lr (1k-steps warmup), ~4% worse then 1x-lr's result. 10x-lr (w/o warmup, 1k-steps warmup, 2k-steps warmup), all not converge.
Could you please provide any examples for init from bert-base-uncased?
Hi there, nice work!
I tried to reproduce the result you provided in Table 3 of the paper, i.e., IR and TR on COCO 1K with the model initialzed from BERT base without pre-training. My results (default setting with all attentions) are far below what you reported: TR: 0.6820 @ R1, 0.9180 @ R5, 0.9620 @ R10 IR: 0.5676 @ R1, 0.8748 @ R5, 0.9466 @ R10
I followed the script, but only change the --model_name_or_path to 'bert-base-uncased'.
Did I miss some thing important or need another set of hyper-params for finetune without pre-training?
Thank you!