senderle / topic-modeling-tool

A point-and-click tool for creating and analyzing topic models produced by MALLET.
https://senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html
Apache License 2.0
106 stars 22 forks source link

Divide Input option is skipping files #67

Closed senderle closed 3 years ago

senderle commented 6 years ago

As reported in #65. @shawngraham writes:

I went and tried it again, armed with my new knowledge of how it works. In the results, when I opened the metadata.csv, a number of my documents were no longer present; that is to say, no results recorded for them. I had n set for 1000, so I thought perhaps the missing ones were smaller and somehow got folded into the previous 1000-chunk, but no, the missing ones should have been split into three or four chunks at least. So I'm not sure what's going on there... I can't seem to see the commonality between the documents that get dropped.

senderle commented 6 years ago

@shawngraham, it also occurs to me that odd characters in filenames can sometimes cause problematic behavior. If possible, could you upload just the topic-metadata.csv file here?

senderle commented 6 years ago

@shawngraham, are you still seeing this problem? It would be great to be able to reproduce it so I can understand how to fix it. No worries if not, but if you have any thoughts, let me know.

shawngraham commented 6 years ago

Hi, I'm sorry - this fell off the radar because I got swamped with other things. I'll try to return to it later this month once the smoke settles!

senderle commented 6 years ago

No problem! And thanks!

senderle commented 3 years ago

Unfortunately I think I have to close this as unable-to-reproduce. Will reopen if it comes up again.