stepthom / lucene-lda

Using latent Dirichlet allocation (LDA) in Apache Lucene
58 stars 23 forks source link

Integrate with MALLET for on-the-fly LDA computation #3

Open doofuslarge opened 11 years ago

doofuslarge commented 11 years ago

One of the much-needed features in lucene-lda is to compute LDA on the fly, for the cases when LDA has not been precomputed on the corpus.

One easy way to do this is to integrate with MALLET:

http://mallet.cs.umass.edu/

MALLET has API calls to run LDA and collect the output. This could all be done in the IndexDirectoryRunLDA.java class.

This may require some changes to the internals of LDAHelper, such as the representation of the matrices (if MALLET returns something different), but should be worth it in the end,