Closed oguzhanalasehir closed 6 years ago
Hi,
The underlying reason is the random process involved in Gibbs sampling, which generates the word-topic assignments at each iteration.
In order to ensure the same results from each run you will need to also freeze the seed. This can be done by using the --seed
flag, which takes an integer as a parameter. If you use the same seed, each run should have the same run. For example, all with the -q
quiet flag for automating:
topicexplorer init corpus -q
topicexplorer prep corpus --high-percent 30 --low-percent 20 -q
topicexplorer train corpus --seed 894721 --iter 200 -k 20 40 60 -q
The variance in topic models remains an open area of research, but I can dig up a few relevant papers on request. First, I wanted to make sure an answer on the inconsistency in the topicexplorer was addressed!
Hello,
What is the reason for producing different keyword distribution over topics even I use same datasource and apply exactly same steps? In other words, each time I run topicexplorer in the same circumstances, the keywords assigned to each topic changes.
Best regards,