Where are BERT's pretrained Embeddings loaded?

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

134.85k stars 26.97k forks source link

Where are BERT's pretrained Embeddings loaded? #1295

Closed BramVanroy closed 5 years ago

BramVanroy commented 5 years ago

I am trying to better understand the difference between the different types of embeddings that BERT uses (from the BERT paper: token, segment, position). For this purpose, I was hoping to put some print statement in the pytorch_transformers source code to see how the IDs change into vector representations for each type of embedding.

First of all I am confused about the embeddings that pytorch_transformers uses. Going through the source code for BertEmbeddings I can see

word embeddings
position embeddings
token type embeddings

What are these token type embeddings? Are they the same as segment embeddings?

Secondly, during my quest for better understanding what's going on, I couldn't figure out where the pretrained embedding models are loaded, or even where they are downloaded. I am curious to see the vocab list of all types of embeddings, but I couldn't find them anywhere.

Any pointers?

julien-c commented 5 years ago

"token type embeddings" are the BERT paper's segment embeddings
embeddings are inside the pretrained weights

BramVanroy commented 5 years ago

Ah that makes sense. So there are no "separate" word2vec-style pretrained embedding models for the different types of embeddings which one could load with nn.Embedding().from_pretrained. Rather, they are loaded in a bunch as a set of pretrained weights. Theoretically, though, one could extract the weights for each embedding, and extract the vocab from the tokenizer, and create a simple lookup (token\tvector)?

Thanks for the reply and your work.

julien-c commented 5 years ago

Sure you could, but I suspect it wouldn’t work too well.

You could say that a large language model’s hidden states are the new way to do word/sentence embeddings (see Sebastian Ruder’s imageNet moment).

BramVanroy commented 5 years ago

Apologies if this is taking too much of your time, but I have a follow-up question. Why wouldn't it work too well? I understand that they are not typical word2vec word representations, since they have been trained together with the whole language model, but why would extracting the embeddings and using them in another task not work well? In other words, what makes the token embeddings of BERT fundamentally different from a typical word2vec model?

julien-c commented 5 years ago

I think you'll find this repo (and associated EMNLP 2019 paper) by @nriemers interesting:

https://github.com/UKPLab/sentence-transformers (built on top of transformers)

logoutAgain commented 4 years ago

"token type embeddings" are the BERT paper's segment embeddings

embeddings are inside the pretrained weights

hi, could you tell where the code about BertEmbedding loaded with the pre-trained weights is?