mirwaes / sclda

Fast variational Bayes inference for Latent Dirichlet Allocation

GNU General Public License v3.0

2 stars 0 forks source link

sclda: Topic proportions for a given document after training LDA model #1

Open srimalj opened 8 years ago

srimalj commented 8 years ago

Hi Mirwaes

I’m using the python code in github: scLDA - Fast variational Bayes inference for Latent Dirichlet Allocation

I am fairly new to topic models and am trying to figure out what method / attributes I could use to get the topic proportions for a given document x (say for a new unseen document) once the LDA model is trained?

Basically I would like to do something similar to the transform() method in the scikit implementation at http://scikit-learn.org/dev/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html#sklearn.decomposition.LatentDirichletAllocation.transform

Any pointers would be much appreciated.

Thanks.

Srimal.

mirwaes commented 8 years ago

Dear Srimal, the topic proportions can be easily computed by running the e-step function once the model is trained:

get list of word ids and counts

unseen_docs = create_doc_count_lists(unseen_data)

compute the topic proportions

gamma, stats = ldainstance.e_step(unseen_docs) theta = gamma.T/gamma.sum(1)

theta is in this case a K x D matrix (where K is the number of topics and D the unseen documents).

Hope this helps. Mirwaes

gauravkoradiya commented 5 years ago

Dear Srimal, the topic proportions can be easily computed by running the e-step function once the model is trained:

get list of word ids and counts

unseen_docs = create_doc_count_lists(unseen_data)

compute the topic proportions

gamma, stats = ldainstance.e_step(unseen_docs) theta = gamma.T/gamma.sum(1)

theta is in this case a K x D matrix (where K is the number of topics and D the unseen documents).

Hope this helps. Mirwaes

Thank you.