Closed Vickeyhw closed 3 years ago
@Vickeyhw I check the code: https://github.com/microsoft/unilm/blob/master/beit/run_beit_pretraining.py#L45 https://github.com/microsoft/unilm/blob/master/beit/run_class_finetuning.py#L49 I make sure that the default position embedding is the same for pretraining and finetuning.
@addf400 https://github.com/microsoft/unilm/blob/db2b1964759418fa691ad2de25e8d8838f1dd4a3/beit/run_class_finetuning.py#L295 In this line, the parameter 'use_shared_rel_pos_bias' is not passed into function, so by default, 'use_shared_rel_pos_bias =False' in VisionTransformer.init(). Maybe you used shared relative pos bias in pretraining, while not in finetuning?
In this line, the parameter 'use_shared_rel_pos_bias' is not passed into function, so by default, 'use_shared_rel_pos_bias =False' in VisionTransformer.init(). Maybe you used shared relative pos bias in pretraining, while not in finetuning?
We can make a relative position bias copy for each transformer block and then do finetuning. More details can be found at: https://github.com/microsoft/unilm/blob/master/beit/run_class_finetuning.py#L337
I am using beit, I find that with the default setting, the beit uses abs_pos_emb in the pretraining stage, while in finetuning stage, it seems not to use any kind of pos_embed. Did I understand it wrong? what kind of position embedding did beit use in pretraining and finetuning respectively to achieve its' best performance?