LDA(Latent Dirichlet Allocation) - Githubissues

woorulez / study

0 stars 0 forks source link

LDA(Latent Dirichlet Allocation) #4

Open woorulez opened 4 years ago

woorulez commented 4 years ago

woorulez commented 4 years ago

using spark LDA

create Bag-o-Word feature vector https://spark.apache.org/docs/latest/ml-features.html#countvectorizer vocab size, minDF, maxDF, minTF
create LDA http://spark.apache.org/docs/latest/ml-clustering.html#latent-dirichlet-allocation-lda example data: https://github.com/apache/spark/blob/master/data/mllib/sample_lda_libsvm_data.txt k(topic vector size)
define a pipeline and save model