vi3k6i5 / GuidedLDA

semi supervised guided topic model with custom guidedLDA
Mozilla Public License 2.0
497 stars 107 forks source link

Are seeds only used at initialization? Then most likely this does not work. #29

Open ychen93 opened 5 years ago

ychen93 commented 5 years ago

Thank you for this repo. I looked at the source code, it seems to me that the seeds are only used at the initialization step. After that, the program will run the Monte Carlo as if it is a normal LDA. Correct me if this is not the case.

My question is, if they are only used at initialization, how do they make a difference for the final model? Although not proven explicitly, I believe in Griffiths and Steyvers (2004) (which you also refer to in the code) they claim it is a Markov chain Monte Carlo.

I also tested on a small dataset, and guidedLDA seems to give the same results as vanilla LDA.

There is (actually many version of) existing seeded LDA code on GitHub. Very unfortunately, they are all research code which has no comment, confusing parameters, and even does not compile. One of them is here: https://github.com/artir/ramesh-acl15 The author is Arti Ramesh (http://www.cs.binghamton.edu/~artir/). It would be great if someone can translate it into a ready-to-use software.

awalt1 commented 5 years ago

hi, thanks for this comment. I am having the same impression when looking at the code. I am not sure whether the implementation really coincides with the jagarlamudi et al. paper. Although it seems to bias topics towards seeds in a small example dataset that I generated...

I would appreciate any comment on this.