Closed dongqing7 closed 7 years ago
There's a couple different ways you can get the documents for each topic. You could use the p_y_given_x
attribute or log_p_y_given_x
attributes to rank which documents are most probable for each topic. You could also get a binary classification of each document in each topic from labels
(which applies a softmax from p_y_given_x
).
You can also use log_z
to rank which documents are "explained" the most by each topic according to pointwise total correlation. If you're looking something simple tough labels
or p_y_given_x
will probably be enough. Note, CorEx is a discriminative model, which means that CorEx estimates the probability a document belongs to a topic separately for each topic and the probabilities don't have to add up to 1.
That's fantastic! Thank you!
Hi, Greg, after successfully fitting the models, how should I retrieve all the documents according the topic?