piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

Adding Word-to-Context Prediction in Word2Vec (inverse of `predict_output_word()`) #2152

Open elliottash opened 6 years ago

elliottash commented 6 years ago

In issue #863 there is the suggestion to predict a word given its contexts.

Another nice feature would be the opposite: given a word, output the probability distribution over contexts (of some window length).

menshikh-iv commented 6 years ago

Hello @ellliottt,

863 implemented in #1209, you want to have "inverse" method, am I right?

ghost commented 5 years ago

I want predict_output_context function too, but I'm not sure how to implement it.

from numpy import exp, dot
from gensim import matutils

def predict_output_context(model, center_word, topn=10):
    word = model.wv.vocab[center_word]
    vec = model.wv.vectors[word.index]
    prob_values = exp(dot(vec, model.trainables.syn1neg.T))
    prob_values /= sum(prob_values)
    top_indices = matutils.argsort(prob_values, topn=topn, reverse=True)
    return [(model.wv.index2word[index1], prob_values[index1]) for index1 in top_indices]

Could you tell me whether this is correct implementation or not? If not, could you write the correct one?