huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.01k stars 27.01k forks source link

Can't load mt5 model after resizing token embedding #9055

Closed alecoutre1 closed 3 years ago

alecoutre1 commented 3 years ago

Environment info

Description

I am having issues to reload a saved mt5 model when the token embedding has been resized. This error doesn't appear with the t5 model. I receive the following error :

Error(s) in loading state_dict for MT5ForConditionalGeneration: size mismatch for lm_head.weight: copying a param with shape torch.Size([250112, 768]) from checkpoint, the shape in current model is torch.Size([250102, 768]).

Is there something different between the models that I am missing ?

To reproduce :

from transformers import MT5ForConditionalGeneration, AutoTokenizer, T5ForConditionalGeneration

model_class = MT5ForConditionalGeneration #T5ForConditionalGeneration
model_path = "google/mt5-base" # "t5-base"

model = model_class.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

tokenizer.add_tokens(['<tok1>', '<tok2>'])
model.resize_token_embeddings(len(tokenizer))

SAVING_PATH = "/tmp/test_model"

model.save_pretrained(SAVING_PATH)
tokenizer.save_pretrained(SAVING_PATH)

new_model = model_class.from_pretrained(SAVING_PATH)
patrickvonplaten commented 3 years ago

Hey @alecoutre1 I think this was fixed very recently.

I cannot reproduce your error on master -> could you try to pip install the master version and see if the error persists?

pip install git+https://github.com/huggingface/transformers
github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.