huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.99k stars 27k forks source link

[Urgent] Word embedding initialization documentation and code might mismatch #6562

Closed guoxuxu closed 3 years ago

guoxuxu commented 4 years ago

Environment info

https://huggingface.co/transformers/main_classes/model.html resize_token_embeddings this documentation says it returns torch.nn.Embeddings The source code also used nn.Embedding (https://huggingface.co/transformers/_modules/transformers/modeling_utils.html#PreTrainedModel.resize_token_embeddings). But I checked the resized embedding.weight that the added embedding weight std() is about 0.01 ~ 0.02 and mean is around 0. While pytorch nn.Embedding is initialized from N(0, 1) (https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html). Is there any gap between documentation and implementation? May I know resize_token_embeddings initialize weights from uniform(-0.05, 0.05) or other distributions ?? which might not be N(0, 1). Though the source code really used nn.Embedding .....

Who can help

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokenizer.add_tokens('<MEME>')
bert_model = BertModel.from_pretrained("bert-base-uncased")
bert_model.resize_token_embeddings(len(tokenizer))
print(bert_model.embeddings.word_embeddings.weight[-1].std())

Expected behavior

usuyama commented 4 years ago

the new embeddings are initialized with N(0, 0.02) by default

in _get_resized_embeddings https://github.com/huggingface/transformers/blob/9c2b2db2cdf0af968aae58d6075b6654224fb760/src/transformers/modeling_utils.py#L650-L651

calling _init_weights https://github.com/huggingface/transformers/blob/9c2b2db2cdf0af968aae58d6075b6654224fb760/src/transformers/modeling_bert.py#L592-L597

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.