Closed antgr closed 4 years ago
@mttk I don't think we are currently supporting this one, right?
We don't support something like this out of the box, as flair has its own API for tokenization / data loading, and there is no simple way to integrate it with torchtext.
If you really want to (e.g. for using a specific dataset), you can do an ugly workaround like:
vectors
object to your Field;Sentence
object from the Field's vocabulary
(e.g. sentence = Sentence(' '.join(TEXT.vocab.itos))
);StackedEmbeddings
object (stacked_embeddings.embed(sentence)
);nn.Embedding
.The embeddings will be fetched in-order so this is equivalent to constructing them via torchtext. Note that I'm not super familiar with flair so there might be a more efficient way of doing this.
So, what you say is to construct a gigantic sentence with all the words of the vocab, use this sentence as argument in stacked_embeddings.embed(sentence)
and iterate through the tokens of the result, concatenating them in a pytorch tensor (3), and then use this tensor for nn.Embedding. Am I right?
Correct. It is ugly, but it will work and I don't see a cleaner way to do it (again, I'm not an expert with flair).
Sorry for the silly question: I am a bit confused with the last step, on how we use this tensor for nn.Embedding. :) Thank you so much for your answer!
There are two options, both equally good: Option 1: Copy data from a tensor
vectors = # initialize the vector tensor or leave at None
embedding = nn.Embedding(num_tokens, embedding_size, padding_idx=0)
if vectors is not None: embedding.weight.data.copy_(vectors)
if freeze_vectors: embedding.weight.requires_grad = False
Option 2: use from_pretrained
embedding = nn.Embedding.from_pretrained(vectors)
Great! Thank you!
No problem -- please do close the issue if everything is resolved.
@mttk do you know if there is currently a more efficient way to do this?
Since flair embeddings are contextual embeddings, wouldn't that be a problem?
when we pass the join of TEXT.vocab.itos (sentence = Sentence ('' .join (TEXT.vocab.itos)
) to the flair are we not losing context?
example of join in the IMDB dataset: "
Thank you in advance for the answer :)
@Edresson the example code would work only for the non-contextualized embeddings, in the case you want to obtain the embedding matrix vectors. This would of course fail when applied on a concrete instance.
If you want to use flair contextual embeddings, I believe you need to load and use the dataset with their pipeline. There might be a workaround, but I'm not familiar with one at the moment as I don't actively use flair.
❓ Questions and Help
Description
Hi, we can use glove embedding when building vocab, using something like:
We also can create embeddings using flair library, using for example:
Could I use the above embeddings instead of glove in the above code? Is anything similar to this supported?