Closed gveni closed 5 years ago
I haven't really hashed out an explanation on how to do this, but it is included. You have to call the get_k_closest()
function. It's a function under the model class. You can find it here
For example, to get the top 10 words in each topic, you run this:
# Set number of topics
num_topics = 20
# Load and train the model
m = model(....)
m.train(...)
# Create an idxs numpy array - these are the idxs we wish to query for the "in_type"
# Since we are choosing to look at in_type "topic" these are the idxs of the topic matrix
idxs = np.arange(num_topics)
# Run the get_k_closest function. It will return k closest "vs_type" embed indexes
# "vs_type" parameter is words by default, so we leave it out.
sim, sim_idxs = m.get_k_closest(idxs, in_type='topic', idx_to_word=idx_to_word, k=10)
If you want to get all of the words, then you'd just have to change that k parameter to be vocab size.
Edit - haven't run this today but pretty sure this works.
Thanks much. Let me try that out.
I know pyLDAvis is one way to visualize lda2vec results. Is there a way or location where all the topics (or words related to a topic along with their weights) are stored. Thanks!