koheiw / seededlda

LDA for semisupervised topic modeling
https://koheiw.github.io/seededlda/
73 stars 15 forks source link

Make the posterior() stats available #79

Open aourednik opened 2 months ago

aourednik commented 2 months ago

The intially wrapped package topicmodels offered the possibility of more refined exploration of topics in every document with topicmodels::posterior(my_lda)$topics. Could this be made available for a result of seededlda::textmodel_lda() ?

Given the probabilistic nature of topic-document associations, it would be nice to sensibilize students and the public to the fact that a given topic is only the most present one in a given text, not the only one.

Example:

lda_model2 <- topicmodels::LDA(convert(my_dfm, to = "topicmodels"), k = 6)
doc_topics <- topicmodels::posterior(lda_model2)$topics
df <- data.frame(doc_id = row.names(doc_topics) %>% str_replace(fixed(".txt"),""), doc_topics)
df_long <- tidyr::pivot_longer(df, cols = starts_with("X"), names_to = "topic", values_to = "importance")
ggplot(df_long, aes(x = importance, y = doc_id, fill = factor(topic))) +
    geom_bar(stat = "identity") +
    labs(x = "Topic Importance", y = "Document ID", fill = "Topic") +
    theme_minimal() +
    theme(axis.text.y = element_text(angle = 0, hjust = 1))

mytextsplot2