Closed Apoorva99 closed 5 years ago
Hi Apoorva,
Attentive Mimicking can be used to obtain embeddings for words. Given one such word, the algorithm assumes that you have several sentences in which this word occurs. It does, however, also work for words if you have no contexts for them at all.
You should be able to get embeddings for words from a trained model if you follow these two steps:
Step 1
Write all words for which you want to get embeddings into a single file (newline-separated). For each such word, also provide all contexts that you have available. For example, let's assume that you want to infer embeddings for the words apples
and oranges
, and you have two contexts for oranges
(let's say, i like oranges
and i bought two oranges
) and no context for apples
. Then your input file (let's call it input.txt
) should look like this:
apples
oranges<TAB>i like oranges<TAB>i bought two oranges
Note that <TAB>
should be replaced by an actual tab character.
Step 2
The acutal inference can then be done using the fcm/infer_vectors.py
script:
python3 fcm/infer_vectors.py -m MODEL_PATH -i input.txt -o output.txt
Afterwards, the file output.txt
contains embeddings for apples
and oranges
, the content of this file should look like this:
apples 0.12345 0.23456 -0.12345 ...
oranges 0.23554 -0.12345 0.34343 ...
Best regards, Timo
Hello there, I was hoping to utilize your repo for one of my project where I need embeddings for rare words. I was trying to implement your attention mimicking code but in the output embeddings, I am getting embeddings for entire sentences instead of words. Is it supposed to be like this or is there some issue? It would be really great if you can help me out on this. Cheers, Apoorva