juba / rainette

R implementation of the Reinert text clustering method
https://juba.github.io/rainette/
53 stars 7 forks source link

Unclassified segments when doing simple clustering? #34

Open gabrielparriaux opened 5 months ago

gabrielparriaux commented 5 months ago

Hello @juba,

I thought that it was only when doing double clustering that there was this option not to force the classification of some segments and that some segments got a NA value as classification.

But I was surprised to do a simple clustering, get back the clusters with cutree_rainette() and have some of them being NA.

Did I do something wrong? Is it normal that some segments are not classified when performing a simple rainette clustering?

Thanks a lot for your opinion about that and all the best!

Gabriel

juba commented 5 months ago

In general when a document gets NA as cluster in a simple classification it doesn't have any content (it is very short and/or only consisted of terms that have been filtered out).

gabrielparriaux commented 5 months ago

Oh, ok! So, it must be the reason… I will check the content of those segments!