issues
search
woorulez
/
study
0
stars
0
forks
source link
LDA(Latent Dirichlet Allocation)
#4
Open
woorulez
opened
4 years ago
woorulez
commented
4 years ago
Intuitive Guide to Latent Dirichlet Allocation
Spark LDA: A Complete Example of Clustering Algorithm for Topic Discovery
woorulez
commented
4 years ago
using spark LDA
create Bag-o-Word feature vector
https://spark.apache.org/docs/latest/ml-features.html#countvectorizer
vocab size, minDF, maxDF, minTF
create LDA
http://spark.apache.org/docs/latest/ml-clustering.html#latent-dirichlet-allocation-lda
example data:
https://github.com/apache/spark/blob/master/data/mllib/sample_lda_libsvm_data.txt
k(topic vector size)
define a pipeline and save model