RuntimeError: Error(s) in loading state_dict for BeitForImageClassification: size mismatch for classifier.weight

dnnxl commented 3 years ago

Environment info

Hi trying to run the BEiTForImageClassification with a custom dataset for a binary classification problem in Google Colab and got the following "RuntimeError: Error(s) in loading state_dict for BeitForImageClassification: size mismatch for classifier.weight and classifier.bias". Seems like the last layer doesn't match with the binary output, instead is mapping to the number of 1000 classes from the ImageNet trained. Any suggestion on how to fix it?

transformers version: 4.10.0
Platform: Google Colab

Models:

nielsr/beit-base-patch16-224

To reproduce

Steps to reproduce the behavior:

Based on https://huggingface.co/nielsr/beit-base-patch16-224.

Run and using the following code ` feature_extractor = BeitFeatureExtractor.from_pretrained('nielsr/beit-base-patch16-224') model = BeitForImageClassification.from_pretrained('nielsr/beit-base-patch16-224', num_labels =2, label2id=label2id, id2label=id2label) `

Expected behavior

` RuntimeError: Error(s) in loading state_dict for BeitForImageClassification: size mismatch for classifier.weight: copying a param with shape torch.Size([1000, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]). size mismatch for classifier.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([2]). `

NielsRogge commented 3 years ago

Hi,

Thanks to #12664, it's now possible to load a fine-tuned checkpoint and replace the head which has a different number of classes, by setting ignore_mismatched_sizes to True when calling the from_pretrained method, like so:

from transformers import BeitForImageClassification

model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224', num_labels=2, ignore_mismatched_sizes=True)

This prints the warning:

Some weights of BeitForImageClassification were not initialized from the model checkpoint at microsoft/beit-base-patch16-224 and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([2, 768]) in the model instantiated
- classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

From that PR, I see that only in modeling_flax_utils.py users get an error message that says "use ignore_mismatched_sizes if you really want to load this checkpoint inside this model." in case not all keys match. Not sure why this suggestion is not printed for PyTorch models. cc @sgugger

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

moahaimen commented 2 years ago

size mismatch for model.classifier.weight: copying a param with shape torch.Size([555, 2208]) from checkpoint, the shape in current model is torch.Size([563, 2208]). size mismatch for model.classifier.bias: copying a param with shape torch.Size([555]) from checkpoint, the shape in current model is torch.Size([563]).

huggingface / transformers