Closed guoxuxu closed 3 years ago
the new embeddings are initialized with N(0, 0.02) by default
in _get_resized_embeddings
https://github.com/huggingface/transformers/blob/9c2b2db2cdf0af968aae58d6075b6654224fb760/src/transformers/modeling_utils.py#L650-L651
calling _init_weights
https://github.com/huggingface/transformers/blob/9c2b2db2cdf0af968aae58d6075b6654224fb760/src/transformers/modeling_bert.py#L592-L597
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Environment info
https://huggingface.co/transformers/main_classes/model.html
resize_token_embeddings
this documentation says it returnstorch.nn.Embeddings
The source code also usednn.Embedding
(https://huggingface.co/transformers/_modules/transformers/modeling_utils.html#PreTrainedModel.resize_token_embeddings). But I checked the resizedembedding.weight
that the added embedding weight std() is about 0.01 ~ 0.02 and mean is around 0. While pytorchnn.Embedding
is initialized from N(0, 1) (https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html). Is there any gap between documentation and implementation? May I knowresize_token_embeddings
initialize weights fromuniform(-0.05, 0.05)
or other distributions ?? which might not be N(0, 1). Though the source code really usednn.Embedding
.....transformers
version: 2.5.1Who can help
Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Expected behavior