MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Recently, I’ve found a bug in the TopicalNGrams.java.
Every time I tried to print via printDocumentTopics(PrintWriter, double, int) it crashed with an IndexOutOfBoundsException if you not had used the same number of topics as the number of provided documents/instances.
Therefore I figured out that the array double topicDist[] in the method was initialized with the wrong size. topics.length returns the number of documents as used quite well in the for-loop below to iterate over all documents. But in this case we have to use numTopics for initialize the topicDist[] array to produce the correct length of the number of topics:
double topicDist[] = new double[numTopics];
Additionally, this is the way the similar array topicCount is initialized in other classes like ParallelTopicModel.
Hopefully, you will be able to fix this bug for the next release.
Hello!
Recently, I’ve found a bug in the TopicalNGrams.java. Every time I tried to print via
printDocumentTopics(PrintWriter, double, int)
it crashed with anIndexOutOfBoundsException
if you not had used the same number of topics as the number of provided documents/instances. Therefore I figured out that the arraydouble topicDist[]
in the method was initialized with the wrong size.topics.length
returns the number of documents as used quite well in the for-loop below to iterate over all documents. But in this case we have to use numTopics for initialize thetopicDist[]
array to produce the correct length of the number of topics:double topicDist[] = new double[numTopics];
Additionally, this is the way the similar array topicCount is initialized in other classes like ParallelTopicModel.Hopefully, you will be able to fix this bug for the next release.
Best regards Jonas