slycoder / R-lda

Latent Dirichlet allocation package for R
16 stars 23 forks source link

log_likelihood option for slda.em #12

Closed xz6014 closed 6 years ago

xz6014 commented 6 years ago

Hi Jonathan,

Would it be possible for the log_likelihood option to be implemented in the slda.em function?

Many thanks,

Xiaohan

slycoder commented 6 years ago

Hi, I'm pretty busy at the moment but you can try passing the log_likelihood option down to .slda.collapsed.gibbs.sampler. I think that ought to work but it won't compute the likelihood conditioned on the observed value; that part would take more work but as a convergence diagnostic doing it just on the words might work out ok.

xz6014 commented 6 years ago

Hi Jonathan,

Thank you for your reply. I have attempted to include the 'compute.log.likelihood' option into slda.em using 'fix' function in R, however, I obtained an error as I test run the modified code, which says "Error in .slda.collapsed.gibbs.sampler(documents, K, vocab, num.e.iterations[1], : could not find function ".slda.collapsed.gibbs.sampler"

I am sorry for the inconvenience that my issue is causing you, but I'd really hope that you can help me resolve this problem.

Many thanks,

Xiaohan

slycoder commented 6 years ago

You can't edit that function with fix because it's not exposed in the namespace (hence the leading period). The best thing to do would probably be to just edit the source code directly (i.e. clone this repo), and the install the package from your local copy.

xz6014 commented 6 years ago

Hi Jonathan,

Thank you for your guidance. I have now implemented the likelihood feature into slda.em and it works.

I attempted to do the following iteration, so for each iteration, I fit a sLDA initialized with the previous topic assignments. However, I notice that the topic assignments has remained constant when I set the number of E-iterations (the number of sampling per gibbs sampling) to be equal to one? Could you please elaborate a bit more on how does the e_iter and m_iter related to the number of iterations in collapsed gibbs sampling?

topic_num <- 70
alpha <- 1.0
eta <- 0.1
variance <- 0.5
lambda <- 1.0
e_iter <- 1
m_iter <- 1
fit_slda2 <- slda.em(documents=documents,K=topic_num,
                     vocab=vocab,
                     num.e.iterations=e_iter,
                     num.m.iterations=m_iter,
                     alpha=alpha, eta=eta,
                     res,
                     params[[which(candidate_k==topic_num)]],
                     variance=variance,
                     lambda=lambda,
                     logistic=FALSE, initial = init,
                     method="sLDA", compute.log.likelihood = TRUE)
init <- list(assignments = fit_slda2$assignments)

Many thanks,

Xiaohan

slycoder commented 6 years ago

Awesome! If you submit a PR I'd be happy to look at it and get it merged into the next release.

Anyways, the first iteration is the initialization iteration; in this case each invocation of the E step initializes the assignments to what they were in the previous step. Therefore you're not going to see any change after only one iteration of the Gibbs step.

P.S. I've updated your comment's formatting to look better on github (check out the markdown tutorial to learn more)