Closed weishao6hao closed 5 years ago
Same thing on my end, actually: one cluster that basically contains all of the ~90% data points and then the other 10% of data points are all in their own cluster. I'm running clustering on several different subreddits from reddit, by way of reference. The sample is ~5000 "documents", each with about ~3-10 sentences. My initial featurization is with tf-idf.
I believe it could be normalisation issue. Please see #13
Same thing on my end, actually: one cluster that basically contains all of the ~90% data points and then the other 10% of data points are all in their own cluster. I'm running clustering on several different subreddits from reddit, by way of reference. The sample is ~5000 "documents", each with about ~3-10 sentences. My initial featurization is with tf-idf.
Was it a normalization issue ? Did it solve it ?
Hi, thank you for your work. I applied this algorithm to my own data, but most of the data are divided into the first cluster. What is the cause of it, please? What kind of improvement do I need to do?