Closed BramVanroy closed 5 years ago
Ah that makes sense. So there are no "separate" word2vec-style pretrained embedding models for the different types of embeddings which one could load with nn.Embedding().from_pretrained
. Rather, they are loaded in a bunch as a set of pretrained weights. Theoretically, though, one could extract the weights for each embedding, and extract the vocab from the tokenizer, and create a simple lookup (token\tvector
)?
Thanks for the reply and your work.
Sure you could, but I suspect it wouldn’t work too well.
You could say that a large language model’s hidden states are the new way to do word/sentence embeddings (see Sebastian Ruder’s imageNet moment).
Apologies if this is taking too much of your time, but I have a follow-up question. Why wouldn't it work too well? I understand that they are not typical word2vec word representations, since they have been trained together with the whole language model, but why would extracting the embeddings and using them in another task not work well? In other words, what makes the token embeddings of BERT fundamentally different from a typical word2vec model?
I think you'll find this repo (and associated EMNLP 2019 paper) by @nriemers interesting:
https://github.com/UKPLab/sentence-transformers (built on top of transformers
)
- "token type embeddings" are the BERT paper's segment embeddings
- embeddings are inside the pretrained weights
hi, could you tell where the code about BertEmbedding loaded with the pre-trained weights is?
I am trying to better understand the difference between the different types of embeddings that BERT uses (from the BERT paper: token, segment, position). For this purpose, I was hoping to put some print statement in the
pytorch_transformers
source code to see how the IDs change into vector representations for each type of embedding.First of all I am confused about the embeddings that
pytorch_transformers
uses. Going through the source code forBertEmbeddings
I can seeWhat are these token type embeddings? Are they the same as segment embeddings?
Secondly, during my quest for better understanding what's going on, I couldn't figure out where the pretrained embedding models are loaded, or even where they are downloaded. I am curious to see the vocab list of all types of embeddings, but I couldn't find them anywhere.
Any pointers?