nateraw / Lda2vec-Tensorflow

Tensorflow 1.5 implementation of Chris Moody's Lda2vec, adapted from @meereeum
MIT License
107 stars 40 forks source link

see topics extracted by lda2vec #35

Closed gveni closed 5 years ago

gveni commented 5 years ago

I know pyLDAvis is one way to visualize lda2vec results. Is there a way or location where all the topics (or words related to a topic along with their weights) are stored. Thanks!

nateraw commented 5 years ago

I haven't really hashed out an explanation on how to do this, but it is included. You have to call the get_k_closest() function. It's a function under the model class. You can find it here

nateraw commented 5 years ago

For example, to get the top 10 words in each topic, you run this:

# Set number of topics
num_topics = 20

# Load and train the model
m =  model(....)
m.train(...)

# Create an idxs numpy array - these are the idxs we wish to query for the "in_type" 
# Since we are choosing to look at in_type "topic" these are the idxs of the topic matrix
idxs = np.arange(num_topics)

# Run the get_k_closest function. It will return k closest "vs_type" embed indexes
# "vs_type" parameter is words by default, so we leave it out.
sim, sim_idxs = m.get_k_closest(idxs, in_type='topic', idx_to_word=idx_to_word, k=10)

If you want to get all of the words, then you'd just have to change that k parameter to be vocab size.

Edit - haven't run this today but pretty sure this works.

gveni commented 5 years ago

Thanks much. Let me try that out.