Closed woctezuma closed 3 years ago
Hi,
The 'microsoft/beit-base-patch16-224-pt22k' model is the one that was pre-trained only using a masked image modeling objective. It should be loaded from a BeitForMaskedImageModeling
model, which adds a layernorm
+ lm_head
on top of BeitModel
as can be seen here. It also doesn't make use of the pooler of BeitModel
, which is why these weights are not initialized.
Thank you for the answer!
I did not know that the layernorm
was considered to be a part of the classifier head for this objective.
So I thought it was an oversight and that the pre-trained weights would be copied to self.layernorm
:
Environment info
transformers
version: 4.11.1Information
Model I am using (Bert, XLNet ...): BEiT
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Run the example code with various values for
model_name
:model_name = 'microsoft/beit-base-patch16-224-pt22k'
model_name = 'microsoft/beit-base-patch16-224-pt22k-ft22k'
model_name = 'microsoft/beit-base-patch16-224'
Case 1:
Case 2:
Case 3:
Expected behavior
Cases 2 and 3 are as expected: the classifier is not used when initializing.
However, case 1:
['layernorm.weight', 'layernorm.bias']
when initializing,['beit.pooler.layernorm.bias', 'beit.pooler.layernorm.weight']
.I think it might be an oversight.
Quotes of the relevant parts of the log for case 1: