Model embedding size and tokenizer size mismatch; resizing embedding will cause CUDA assert error

huu4ontocord commented 3 years ago

Environment info

Google colab

Who can help

T5: @patrickvonplaten

Information

I'm noticing something strange with T5. The model embedding size and the tokenizer size does not match. When I try to resize the model to have a smaller embedding this crashes CUDA. This is probably two bugs - one for the size mismatch, and one for shortening the embedding causing a crash.

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("t5-base")
print (len(tokenizer))
model = AutoModel.from_pretrained("t5-base")
print (model.shared)
model.resize_token_embeddings(len(tokenizer))
model.to('cuda')

Expected behavior

Expected behaviour is regular loading of the model into cuda.

What I got instead was: 32100 Some weights of T5Model were not initialized from the model checkpoint at t5-base and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Embedding(32128, 768)

RuntimeError Traceback (most recent call last)

in () 5 print (model.shared) 6 model.resize_token_embeddings(len(tokenizer)) ----> 7 model.to('cuda') 3 frames /usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in convert(t) 608 if convert_to_format is not None and t.dim() == 4: 609 return t.to(device, dtype if t.is_floating_point() else None, non_blocking, memory_format=convert_to_format) --> 610 return t.to(device, dtype if t.is_floating_point() else None, non_blocking) 611 612 return self._apply(convert) RuntimeError: CUDA error: device-side assert triggered

patrickvonplaten commented 3 years ago

Hey @ontocord, I cannot reproduce your error on master...

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("t5-base")
print (len(tokenizer))
model = AutoModel.from_pretrained("t5-base")
print (model.shared)
model.resize_token_embeddings(len(tokenizer))
model.to('cuda')

works fine for me.

patrickvonplaten commented 3 years ago

I am able to correctly shorten the embedding matrix

huu4ontocord commented 3 years ago

@patrickvonplaten Thank you. It's also working now in my code too with latest version of transformer. Thanks for looking into this!

huggingface / transformers