maximtrp / bitermplus

Biterm Topic Model (BTM): modeling topics in short texts
https://bitermplus.readthedocs.io/en/stable/
MIT License
77 stars 13 forks source link

How do I get the topic words? #10

Closed aguinaldoabbj closed 3 years ago

aguinaldoabbj commented 3 years ago

Hi,

Firstly, thanks for sharing your code.

Not an issue, just a question. I'm able to see the relevant words for a topic in the tmplot report. How do I get those words? I need to get at least the most three relevant terms.

Thanks in advance.

maximtrp commented 3 years ago

Hello!

You can use tmplot.calc_terms_probs_ratio function for this. Here is an example:

import tmplot as plt

# Train or import a trained model here
# model = ...

# Get a phi matrix
phi = tmp.get_phi(model)

# Calculate terms probabilities
# Do not forget to pass topic id with `topic` argument
terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)
terms_probs
aguinaldoabbj commented 3 years ago

Hi @maximtrp Sorry for digging up this issue. I missed your answer. Thanks by the way. Another question I'd like to do is how to get the documents (sentences) that fit each of the topics. I can see that the phi matrix relates topics with words. Where could I find the relationship between topics and documents (as shown in tmplot) ? I want to build a matrix relating a document to the most relevant terms of its topic. Thanks in advance.

maximtrp commented 3 years ago

@aguinaldoabbj You can use tmplot.get_theta() or just matrix_docs_topics_ and matrix_topics_docs_ attributes on a model instance. Example:

model = btm.BTM(X, vocabulary, seed=12321, T=8, M=20, alpha=50/8, beta=0.01)
model.fit(biterms, iterations=20)
model.matrix_docs_topics_