huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.11k stars 27.04k forks source link

Some weights of BeitModel were not initialized from the model checkpoint #13808

Closed woctezuma closed 3 years ago

woctezuma commented 3 years ago

Environment info

Information

Model I am using (Bert, XLNet ...): BEiT

The problem arises when using:

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

Run the example code with various values for model_name:

  1. model_name = 'microsoft/beit-base-patch16-224-pt22k'
  2. model_name = 'microsoft/beit-base-patch16-224-pt22k-ft22k'
  3. model_name = 'microsoft/beit-base-patch16-224'
from transformers import BeitFeatureExtractor, BeitModel
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

model_name = 'microsoft/beit-base-patch16-224-pt22k'
# model_name = 'microsoft/beit-base-patch16-224-pt22k-ft22k'
# model_name = 'microsoft/beit-base-patch16-224'

feature_extractor = BeitFeatureExtractor.from_pretrained(model_name)
model = BeitModel.from_pretrained(model_name)

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state

Case 1:

Some weights of the model checkpoint at microsoft/beit-base-patch16-224-pt22k were not used when initializing BeitModel: ['layernorm.weight', 'lm_head.bias', 'layernorm.bias', 'lm_head.weight']
- This IS expected if you are initializing BeitModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BeitModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BeitModel were not initialized from the model checkpoint at microsoft/beit-base-patch16-224-pt22k and are newly initialized: ['beit.pooler.layernorm.bias', 'beit.pooler.layernorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Case 2:

Some weights of the model checkpoint at microsoft/beit-base-patch16-224-pt22k-ft22k were not used when initializing BeitModel: ['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing BeitModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BeitModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Case 3:

Some weights of the model checkpoint at microsoft/beit-base-patch16-224 were not used when initializing BeitModel: ['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing BeitModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BeitModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Expected behavior

Cases 2 and 3 are as expected: the classifier is not used when initializing.

However, case 1:

I think it might be an oversight.

Quotes of the relevant parts of the log for case 1:

Some weights of the model checkpoint at microsoft/beit-base-patch16-224-pt22k
were not used when initializing BeitModel:
['layernorm.weight', 'lm_head.bias', 'layernorm.bias', 'lm_head.weight']
Some weights of BeitModel were not initialized from the model checkpoint at microsoft/beit-base-patch16-224-pt22k
and are newly initialized:
['beit.pooler.layernorm.bias', 'beit.pooler.layernorm.weight']
NielsRogge commented 3 years ago

Hi,

The 'microsoft/beit-base-patch16-224-pt22k' model is the one that was pre-trained only using a masked image modeling objective. It should be loaded from a BeitForMaskedImageModeling model, which adds a layernorm + lm_head on top of BeitModel as can be seen here. It also doesn't make use of the pooler of BeitModel, which is why these weights are not initialized.

woctezuma commented 3 years ago

Thank you for the answer!

I did not know that the layernorm was considered to be a part of the classifier head for this objective.

https://github.com/huggingface/transformers/blob/7db2a79b387fd862ffb0af72f7148e6371339c7f/src/transformers/models/beit/modeling_beit.py#L679-L688

So I thought it was an oversight and that the pre-trained weights would be copied to self.layernorm:

https://github.com/huggingface/transformers/blob/7db2a79b387fd862ffb0af72f7148e6371339c7f/src/transformers/models/beit/modeling_beit.py#L560-L571