mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
984 stars 344 forks source link

TopicalNGrams printDocumentTopcis(PrintWriter, double, int) throws an IndexOutOfBoundsException #126

Open Jonas-Jaeger opened 6 years ago

Jonas-Jaeger commented 6 years ago

Hello!

Recently, I’ve found a bug in the TopicalNGrams.java. Every time I tried to print via printDocumentTopics(PrintWriter, double, int) it crashed with an IndexOutOfBoundsException if you not had used the same number of topics as the number of provided documents/instances. Therefore I figured out that the array double topicDist[] in the method was initialized with the wrong size. topics.length returns the number of documents as used quite well in the for-loop below to iterate over all documents. But in this case we have to use numTopics for initialize the topicDist[] array to produce the correct length of the number of topics: double topicDist[] = new double[numTopics]; Additionally, this is the way the similar array topicCount is initialized in other classes like ParallelTopicModel.

Hopefully, you will be able to fix this bug for the next release.

Best regards Jonas