mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
984 stars 344 forks source link

Error with assigning initial topics in ParallelTopicModel.java #166

Closed clause closed 5 years ago

clause commented 5 years ago

I think there's an issue with assigning initial topics. On Line 248 the loop index is the size of the topics array. It should be the length of the token sequence. Because the topic array has a minimum capacity (currently 2) there are always at least some topics added, even if the document has fewer than 2 tokens.

mimno commented 5 years ago

Thank you for spotting this! I fixed this and a few other instances in the topic model. I'm not sure the extra topics were ever being used for anything except reports, but I'm glad this is fixed.