mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
972 stars 346 forks source link

Computing Perplexity #197

Open waelbenamara opened 3 years ago

waelbenamara commented 3 years ago

Is there anyway to return perplexity with respect to the number of iterations? In case we want to optimise the number of iteration and avoid getting into burn-in periods in future executions. A way I found to do that, is to add a list attribute that is iteratively filled with the perplexity corresponding to that iteration. Perplexity is computed as exp(-(modelLogLikelihood() / totalTokens))) Any chance I can submit a pull request?

mimno commented 3 years ago

Thanks for looking into this! I'm not sure I understand. Is the idea to stop training when model log likelihood stops dropping? Burn-in usually refers to the early iterations, while log likelihood is still improving.