Embd File - Githubissues

Hi Apoorva,

Attentive Mimicking can be used to obtain embeddings for words. Given one such word, the algorithm assumes that you have several sentences in which this word occurs. It does, however, also work for words if you have no contexts for them at all.

You should be able to get embeddings for words from a trained model if you follow these two steps:

Step 1 Write all words for which you want to get embeddings into a single file (newline-separated). For each such word, also provide all contexts that you have available. For example, let's assume that you want to infer embeddings for the words apples and oranges, and you have two contexts for oranges (let's say, i like oranges and i bought two oranges) and no context for apples. Then your input file (let's call it input.txt) should look like this:

apples oranges<TAB>i like oranges<TAB>i bought two oranges

Note that <TAB> should be replaced by an actual tab character.

Step 2 The acutal inference can then be done using the fcm/infer_vectors.py script:

python3 fcm/infer_vectors.py -m MODEL_PATH -i input.txt -o output.txt

Afterwards, the file output.txt contains embeddings for apples and oranges, the content of this file should look like this:

apples 0.12345 0.23456 -0.12345 ... oranges 0.23554 -0.12345 0.34343 ...

Best regards, Timo

timoschick / form-context-model

Embd File #2