All data point are assigned to either cluster 0 or 1. None is assigned to '-1', i.e. noise. Yet the memberships of the first data point sum to 0.14, not 1.0.
Shall 1.0-0.14070807 [=1-sum(cluster membership of first point)] be interpreted as the probability of the first point being a 'noise point'? If so the probability is higher that membership to either cluster 0 or 1, and it seems that point should be labelled '-1'.
The documentation states that 'The return value is a two-dimensional numpy array. Each point of the input data is assigned a vector of probabilities of being in a cluster. ', which does not seem to be the case. Am I missing something?
Dear all,
I first noticed this issue with the cuml implementation of HDBSCAN, but it mirrors the original CPU version.
produces, as expected
Now
returns
All data point are assigned to either cluster 0 or 1. None is assigned to '-1', i.e. noise. Yet the memberships of the first data point sum to 0.14, not 1.0.
Shall
1.0-0.14070807
[=1-sum(cluster membership of first point)
] be interpreted as the probability of the first point being a 'noise point'? If so the probability is higher that membership to either cluster 0 or 1, and it seems that point should be labelled '-1'.The documentation states that 'The return value is a two-dimensional numpy array. Each point of the input data is assigned a vector of probabilities of being in a cluster. ', which does not seem to be the case. Am I missing something?
All the best,
Vincent