Hyperparameters mismatch

jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

MIT License

738 stars 110 forks source link

Closed e-bug closed 4 years ago

e-bug commented 4 years ago

Hi @jackroos and thanks for the great repo!

I was looking at the cfgs file for VQA and noticed different hyperparameters than in the appendix of the paper. For instance, 5 epochs instead of 20, 500 warmup steps instead of 2000, smaller learning rate, ... Should we follow -- in this and other tasks -- the values in the repository or the ones in the paper?
Also, are inputs not truncated to a maximum length during fine-tuning?

Thanks!

jackroos commented 4 years ago

You can fine-tune it with 20 epochs, but we found 5 epoch is enough for pre-trained VL-BERT. 20 epochs setting is for comparison with model without pre-training. And for the learning rate, it is consistent with paper, you need to multiply the batch size since the LR in config yaml is normalized by batch size.
Since the length of VQA are usually not very long, we don't conduct truncating.

Thank you!