Open gabrielparriaux opened 10 months ago
This is not directly related to rainette. You have to compute the size of the clusters and filter out the smaller ones. Something like:
tab <- table(clusters)
names(tab)[tab > min_size]
Thanks a lot for your help and sorry to ask a question not directly related to rainette… 😰
Just a question: is there an object “clusters” that I can use to compute the size of each cluster? I can’t find it in the docs.
I had an idea of computing the size of each cluster with something like this:
clusters <- clusters_by_doc_table(dtm_for_analysis, clust_var = "Cluster")
sum(clusters$clust_1)
…
But, then I should have a loop to do it for each cluster in the clustering… and I think maybe there is something simpler?
Sorry if it’s an obvious question…
If you're looking for the size of each cluster in terms of number of segments, then doing the following should be enough:
clusters <- cutree(res, k = 5)
table(clusters)
Or in your example:
table(dtm_for_analysis$Cluster)
Much easier like this 😬. Thanks a lot for helping, this is exactly what I needed!
Hello, thanks @juba for the clarifications in this thread and other ones.
I just tried rainette for a few hours on 2 corpora (one from social networks, on health problems; the other from an RPS survey in a large company), and It's a pleasure for users, both because it secures the ability to use Reinert method (Iramuteq updates were quite expected, excuse me Pierre R.!) and because rainette outputs are normal, reusable R objects.
Yet, @gabrielparriaux 's and others' point make sense to me. On both corpora, around 2/3 of classes were very tiny: I could, for sure, filter these outlier classes afterwards, but as far as I understand, a) I cannot use rainette_explor on the filtered list (so the explor screen contents are 2/3 garbage), b) it will imply to renumber for classes, the correspondence is missed.
Imho, it would be great:
Just my two cents; I am (well, I became, with age) an absolute layman in software development, and other users may feel the issue non-existent.
Hi,
Thanks for your observations, and I think they are totally legit. Unfortunately I don't work on textual analysis anymore, and so I'm on a low maintenance mode on rainette currently. But I'll definitely try to try to implement your suggestions if I find some time in the future...
Hi @juba,
After having done a Rainette clustering, I often execute a Correspondence Analysis with lexicon and clusters.
In that case, very small clusters tend to pull the plot to the extremes, making it difficult to read.
So I’m looking for a way to select and isolate the clusters that contain a very small number of segments.
In some way, I need to build a vector with the names of the clusters that contain less than a certain number of segments.
I have looked at the documentation available but have no idea of how to do it.
Can you help me and put me on the way?
Thanks a lot for your help!
Gabriel