Closed dnnxl closed 3 years ago
Hi,
Thanks to #12664, it's now possible to load a fine-tuned checkpoint and replace the head which has a different number of classes, by setting ignore_mismatched_sizes
to True
when calling the from_pretrained
method, like so:
from transformers import BeitForImageClassification
model = BeitForImageClassification.from_pretrained('microsoft/beit-base-patch16-224', num_labels=2, ignore_mismatched_sizes=True)
This prints the warning:
Some weights of BeitForImageClassification were not initialized from the model checkpoint at microsoft/beit-base-patch16-224 and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([2, 768]) in the model instantiated
- classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([2]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
From that PR, I see that only in modeling_flax_utils.py
users get an error message that says "use ignore_mismatched_sizes if you really want to load this checkpoint inside this model." in case not all keys match. Not sure why this suggestion is not printed for PyTorch models. cc @sgugger
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
size mismatch for model.classifier.weight: copying a param with shape torch.Size([555, 2208]) from checkpoint, the shape in current model is torch.Size([563, 2208]). size mismatch for model.classifier.bias: copying a param with shape torch.Size([555]) from checkpoint, the shape in current model is torch.Size([563]).
Environment info
Hi trying to run the BEiTForImageClassification with a custom dataset for a binary classification problem in Google Colab and got the following "RuntimeError: Error(s) in loading state_dict for BeitForImageClassification: size mismatch for classifier.weight and classifier.bias". Seems like the last layer doesn't match with the binary output, instead is mapping to the number of 1000 classes from the ImageNet trained. Any suggestion on how to fix it?
transformers
version: 4.10.0Models:
To reproduce
Steps to reproduce the behavior:
Based on https://huggingface.co/nielsr/beit-base-patch16-224.
`
feature_extractor = BeitFeatureExtractor.from_pretrained('nielsr/beit-base-patch16-224') model = BeitForImageClassification.from_pretrained('nielsr/beit-base-patch16-224', num_labels =2, label2id=label2id, id2label=id2label)`
Expected behavior
`
RuntimeError: Error(s) in loading state_dict for BeitForImageClassification: size mismatch for classifier.weight: copying a param with shape torch.Size([1000, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]). size mismatch for classifier.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([2]).`