juba / rainette

R implementation of the Reinert text clustering method
https://juba.github.io/rainette/
53 stars 7 forks source link

Cluster 1 of plot does not show; all the terms (7) do not show on the other Clusters #12

Closed ghost closed 2 years ago

ghost commented 2 years ago

I wrote directly before, you corrected the problem, but this one remains. My same file that I sent would be applicable.

Screenshot 2022-03-17 110354
juba commented 2 years ago

I think the fact that the plot for the first cluster is empty is due to the fact that it is a very small group of only two documents, so there is no feature for which the keyness statistics is significant. Same thing for the fifth topic, there may be only 3 statistically significant features keyness.

You may check this by computing keyness directly with something like this :

groups <- cutree(rainClus, k = 6)
quanteda.textstats::textstat_keyness(rainDFM, target = groups == 1, measure = "chi2")

This should not show any feature with a p-value lesser than 0.05.