[Urgent] Word embedding initialization documentation and code might mismatch

guoxuxu commented 4 years ago

Environment info

https://huggingface.co/transformers/main_classes/model.html resize_token_embeddings this documentation says it returns torch.nn.Embeddings The source code also used nn.Embedding (https://huggingface.co/transformers/_modules/transformers/modeling_utils.html#PreTrainedModel.resize_token_embeddings). But I checked the resized embedding.weight that the added embedding weight std() is about 0.01 ~ 0.02 and mean is around 0. While pytorch nn.Embedding is initialized from N(0, 1) (https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html). Is there any gap between documentation and implementation? May I know resize_token_embeddings initialize weights from uniform(-0.05, 0.05) or other distributions ?? which might not be N(0, 1). Though the source code really used nn.Embedding .....

transformers version: 2.5.1
Platform: linux
Python version: 3.7.4
PyTorch version (GPU?): 1.4.0
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

[x] the official example scripts: (give details below)
[x] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokenizer.add_tokens('<MEME>')
bert_model = BertModel.from_pretrained("bert-base-uncased")
bert_model.resize_token_embeddings(len(tokenizer))
print(bert_model.embeddings.word_embeddings.weight[-1].std())

Expected behavior

usuyama commented 4 years ago

the new embeddings are initialized with N(0, 0.02) by default

in _get_resized_embeddings https://github.com/huggingface/transformers/blob/9c2b2db2cdf0af968aae58d6075b6654224fb760/src/transformers/modeling_utils.py#L650-L651

calling _init_weights https://github.com/huggingface/transformers/blob/9c2b2db2cdf0af968aae58d6075b6654224fb760/src/transformers/modeling_bert.py#L592-L597

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

huggingface / transformers