senderle / topic-modeling-tool

A point-and-click tool for creating and analyzing topic models produced by MALLET.
https://senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html
Apache License 2.0
106 stars 22 forks source link

Empty files throw off document IDs, rendering output meaningless #58

Closed senderle closed 7 years ago

senderle commented 7 years ago

If you include empty files in the input directory, stupid assumptions made by buildNtd in CsvBuilder cause the document IDs to fall out of sync. This wreaks havoc on much of the output, making it quite meaningless! (It appears that the metadata file remains correct, but that's about it, sadly.)