Closed dselivanov closed 9 years ago
lda.collapsed.gibbs.sampler
again, except
initial=list(topics=your_model$topics, topic_sums=your_model$topic_sums)
.freeze.topics
to TRUE so that the topics will be treated as fixed and not updated.@slycoder thank you very much for such detailed answer! Seems I tried to reinvent the wheel for second question =) I believe I can set small number of iterations for prediction (5-10-20) as suggested here? Is it enough in practice?
That will might be enough but it can't hurt to run for more (to at least give an idea of how close things are).
many thanks, closing this.
@slycoder FYI based on your code and rewrote vanilla LDA with Rcpp (R's C interface is too verbose...) - https://github.com/dselivanov/text2vec/blob/0.4/src/gibbs.cpp#L18
Surprisingly it happened that it is about 1.5-2x faster (removed a lot of if
conditions and sLDA related stuff)...
Thanks for pointing that out. It's curious because I would've thought that the CPU would've been able to branch predict away most of if statements. I'll have to dig further.
Opened #8 to investigate.
Hi, Jonathan! thank you very much for this package, it is far the best I found for R. I realise, that code for this package was written a while ago, but I hope you remember some details=) I have few questions:
What does exactly mean warning in
lda.collapsed.gibbs.sampler()
function? Is this comment still relevant? Is this mean that each word sampled only once while fitting gibbs sampling? So actually algorithm don't use word counts (all counts assumed to 1?)?lda
model on large copus, so I havedocument_sums
andtopics
matrices. Now I want to predict topics for new document (didn't observed). Is it possible? I found this topic and ended with such simplified solution (in R, don't considering speed issues, just proof-of-concepts):Are you interesting in pull requests? Do you have time for review and maintain package?