mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
984 stars 344 forks source link

Mallet commandline determine convergence? #179

Closed findgit123 closed 4 years ago

findgit123 commented 4 years ago

We are testing Mallet commandline and have found some diagnostics file after run. How can we check sampling convergence? Where is the report or how to check this measure. And will you please simply interpret that measure?
Many thanks

mimno commented 4 years ago

This is more of a Stack Overflow question, not a potential problem with the software.

Looking at the log likelihood number should give an indication of whether the sampler has reached a stable state. If you are concerned about finding an optimal solution, the --num-icm-iterations option will choose the best topic assignment for each token until no tokens change. Otherwise saving multiple states (--output-state-interval) and averaging over them is a good idea.

The diagnostics file is only about the quality of a model, and has no information on MCMC convergence. There are no exact answers for MCMC convergence, and many of the available heuristic methods assume that you have a small number of continuous variables. In LDA sampling we have potentially millions of categorical variables.

findgit123 commented 4 years ago

thanks a lot