Open codata-hg opened 5 years ago
The sum will be the probability that the point is in any cluster. Since HDBSCAN considers some points "noise" you can think of this as one minus the probability that the point is noise. Hopefully that is helpful.
On Thu, Oct 25, 2018 at 6:32 PM codata-hg notifications@github.com wrote:
Hi, I have a clusterer trained with many clusters identified. I used
hdbscan.prediction.membership_vector(clusterer, points_to_predict)
to get the probability distribution of the points over all clusters. I was expecting the sum of all membership score in one vector is equal to one. But it's not. Why is that?
Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/246, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBaIJ3zVVCwIhf8ALmAUnjHY9X6Byks5uojvlgaJpZM4X7OgN .
Thanks for your timely response. It makes sense! But I also had some wired observations. I saw some samples in the core of clusters with probabilities_=1, but when I use it for prediction with membership_vector(), sometimes I got zero probability in that particular cluster it belongs to, but non-zero for the rest.
Also, I do the same testing on samples in noise. Some are normal, with pretty low sum of probabilities, which means it's dissimilar from all clusters; But there are some samples giving sum of probabilities close to 1, like [0.3, 0.3 0.3]. Any thought on this?
Sadly there are some bugs in the soft cluster membership. It works fine for some datasets, but can get messed up badly at times. I have plans for a grand re-write at some point, so haven't really tracked exactly what is astray. Sorry.
Hello @lmcinnes, I just wanted to check if the bug related to soft cluster membership that would effect membership_vector is fixed?
+1 checking in again to see if its fixed else adding a #featureRequest for it
I think I have the same issue. I expected 1 - probabilities_ + all_points_membership_vectors.sum(axis=1) == 1
. For some reason, this is not always the case, from time to time I get values significantly grater than 1, like 1.07. By the way, I love your work, @lmcinnes
I'm not currently maintaining the soft clustering anymore as I have too many other things on my plate. PRs are welcome however.
Hi, I have a clusterer trained with many clusters identified. I used
hdbscan.prediction.membership_vector(clusterer, points_to_predict)
to get the probability distribution of the points over all clusters. I was expecting the sum of all membership score in one vector is equal to one. But it's not. Why is that?
Thanks