woorulez / study

0 stars 0 forks source link

LDA(Latent Dirichlet Allocation) #4

Open woorulez opened 4 years ago

woorulez commented 4 years ago
woorulez commented 4 years ago

using spark LDA

  1. create Bag-o-Word feature vector https://spark.apache.org/docs/latest/ml-features.html#countvectorizer vocab size, minDF, maxDF, minTF
  2. create LDA http://spark.apache.org/docs/latest/ml-clustering.html#latent-dirichlet-allocation-lda example data: https://github.com/apache/spark/blob/master/data/mllib/sample_lda_libsvm_data.txt k(topic vector size)
  3. define a pipeline and save model