Open cassiehkx opened 6 years ago
@cassiehkx can you share link to some other form of LDA which has incremental learning
or any other library?
Basically I want to understand how that works so I can think of an approach to do it.
Thanks :)
I haven't got useful suggestions in this case to enable incremental learning. But the problem lies in your input as an nparray, which makes the program unable to expand to a large amount of data. Incremental learning is just one solution I could think of at the moment. Maybe we could change the input data into a sparse matrix? but in that case the matrix multiplication in the loglikelihood would be a problem. what would you recommend?
Fair point, switching to sparse matrix should be easier in comparison to incremental learning.
Then another question is the difference between your code and the original scikit-learn LDA code where the eta parameter can control the initialization weight. The paper you were referring to described a more sophisticated method, while your code seems to only set a higher weight for the seed words in the beginning and does not do much during the loglikelihood calculation part. Then what would be the difference comparing to just setting an initialized eta matrix with seed words?
In the Gensim implementation of LDA, you can set chunk size to learn incrementally I think?
It worked very well on small dataset. Can it be improved to enable incremental learning in case of a huge dataset?