pmisteliac / mmsr_repo_sim

0 stars 1 forks source link

WIP: Topic model #3

Closed jan-gerling closed 5 years ago

jan-gerling commented 5 years ago

Preprocesser:

  1. Splitting (CamelCase and Underscore)
  2. Stopword removal (english language and java)
  3. Stemming (Porter Stemmer)

Topic Modeling:

  1. gensim LDA
  2. mallet LDA

Evaluation:

  1. Perplexity
  2. Coherence