vi3k6i5 / GuidedLDA

semi supervised guided topic model with custom guidedLDA
Mozilla Public License 2.0
497 stars 108 forks source link

About seed words information #49

Open bohyunshin opened 4 years ago

bohyunshin commented 4 years ago

If I understand the algorithm correctly, does the algorithm use seed words information for initialization only?

In _fit function, before iterating algorithm, we do the initialization where we assign topics to the words according to the belonging to seed words. Below is the line I just mentioned

https://github.com/vi3k6i5/GuidedLDA/blob/6ddfbe47b5f3f8138852c1987cc55bc0703c8f8b/guidedlda/guidedlda.py#L241

However, I noticed that there is this one step that uses the seed words. After this initialization, while iterating, the algorithm do the classic collapsed Gibbs sampling. Is my understanding correct? If it is correct, why seed words are not used while iterating?

Thank you in advance