inpho / topic-explorer

System for building, visualizing, and working with LDA topic models
https://www.hypershelf.org/
Other
92 stars 22 forks source link

Changing Topic Distribution #321

Closed oguzhanalasehir closed 6 years ago

oguzhanalasehir commented 6 years ago

Hello,

What is the reason for producing different keyword distribution over topics even I use same datasource and apply exactly same steps? In other words, each time I run topicexplorer in the same circumstances, the keywords assigned to each topic changes.

Best regards,

JaimieMurdock commented 6 years ago

Hi,

The underlying reason is the random process involved in Gibbs sampling, which generates the word-topic assignments at each iteration.

In order to ensure the same results from each run you will need to also freeze the seed. This can be done by using the --seed flag, which takes an integer as a parameter. If you use the same seed, each run should have the same run. For example, all with the -q quiet flag for automating:

topicexplorer init corpus -q
topicexplorer prep corpus --high-percent 30 --low-percent 20 -q
topicexplorer train corpus --seed 894721 --iter 200 -k 20 40 60 -q

The variance in topic models remains an open area of research, but I can dig up a few relevant papers on request. First, I wanted to make sure an answer on the inconsistency in the topicexplorer was addressed!