microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Question about beit for pos_emb #431

Closed Vickeyhw closed 3 years ago

Vickeyhw commented 3 years ago

I am using beit, I find that with the default setting, the beit uses abs_pos_emb in the pretraining stage, while in finetuning stage, it seems not to use any kind of pos_embed. Did I understand it wrong? what kind of position embedding did beit use in pretraining and finetuning respectively to achieve its' best performance?

addf400 commented 3 years ago

@Vickeyhw I check the code: https://github.com/microsoft/unilm/blob/master/beit/run_beit_pretraining.py#L45 https://github.com/microsoft/unilm/blob/master/beit/run_class_finetuning.py#L49 I make sure that the default position embedding is the same for pretraining and finetuning.

Vickeyhw commented 3 years ago

@addf400 https://github.com/microsoft/unilm/blob/db2b1964759418fa691ad2de25e8d8838f1dd4a3/beit/run_class_finetuning.py#L295 In this line, the parameter 'use_shared_rel_pos_bias' is not passed into function, so by default, 'use_shared_rel_pos_bias =False' in VisionTransformer.init(). Maybe you used shared relative pos bias in pretraining, while not in finetuning?

addf400 commented 3 years ago

@addf400 https://github.com/microsoft/unilm/blob/db2b1964759418fa691ad2de25e8d8838f1dd4a3/beit/run_class_finetuning.py#L295

In this line, the parameter 'use_shared_rel_pos_bias' is not passed into function, so by default, 'use_shared_rel_pos_bias =False' in VisionTransformer.init(). Maybe you used shared relative pos bias in pretraining, while not in finetuning?

We can make a relative position bias copy for each transformer block and then do finetuning. More details can be found at: https://github.com/microsoft/unilm/blob/master/beit/run_class_finetuning.py#L337