juba / rainette

R implementation of the Reinert text clustering method
https://juba.github.io/rainette/
53 stars 7 forks source link

Add keyness statistics export funcion #4

Closed juba closed 4 years ago

juba commented 4 years ago

Add a simpler and friendlier version of keyness_stats, as requested in #3

manubonnet commented 4 years ago

I wanted to extract all the terms from a class, and your answer works well :

groups <- cutree_rainette(res, k = 3)
terms <- rainette:::keyness_stats(groups, dtm, "chi2", rlang::sym("chi2"), show_negative = F, n_terms = 100)
# The 100 first terms from class 2
terms[[2]][,1]
juba commented 4 years ago

I just added a rainette_stats function, which is documented and a bit simpler :

groups <- cutree_rainette(res, k = 3)
rainette_stats(groups, dtm, n_terms = 100, show_negative = FALSE)
gabrielparriaux commented 2 years ago

I have a quick question about the rainette_stats function: what are n_target and n_reference values about?

juba commented 2 years ago

The output of rainette_stats is in fact the output of the textstat_keyness function from quanteda. IIRC, n_target is the number of occurrences of the term in the current group, and n_reference is the number of occurrences in the rest of the corpus.