Open jnferfer opened 9 months ago
You may first check the tokenization of the sentences, record the indices of desired words, e.g., big apple, and find token embeddings following the indices.
Thanks! Then, if I want to get a single embedding for "big apple", how should I proceed? I'm trying to get the average embedding of "big" and "apple", but I sometimes get odd results when comparing the average embedding against others.
Hi,
I need to get the embeddings of a word or a phrase within a sentence. This sentence is the context of the word/phrase.
For example, I need the different embedding values of
big apple
in these two sentences:I'm living in the Big Apple since 2012
I ate a big apple yesterday
When using
model.encode()
I can set the parameteroutput_value
totoken_embeddings
to get token embeddings. However, I don't know how to properly map the output vectors to the target tokens corresponding to thebig apple
text. Is there a straightforward approach for this?Thanks!