scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.79k stars 501 forks source link

Sum of stability scores as a measure of cluster quality? #196

Open chaturv3di opened 6 years ago

chaturv3di commented 6 years ago

I am working with dataset from a domain of which I'm not an expert. This means that I have neither labels nor ground truth clustering available to me. I am trying to figure out a good way in which I could assess the quality of clusters which I obtain upon varying the minimum number of samples and minimum cluster size.

Clustering metrics which I am aware of either assume that points in clusters are normally distributed, e.g. Silhouette Score and F-Score, or rely on a reference/ground truth labeling, e.g. adjusted mutual information and completeness scores.

I wonder if one could define a quality metric using stability scores of the flattened clusters (which I believe are available from cluster_persistence_ field). If my understanding is correct, these scores are already "normalised" for the sizes of different clusters in the sense that they are computed by adding relative excess of mass over all points assigned to these clusters. Therefore, a straightforward sum of all stability scores seems like a reasonable definition of such a metric. The higher the sum, the better the clustering; and one can search in the hyper-parameter space to find a maximal value.

It would be great to hear your thoughts on this. For HDBSCAN in particular, shouldn't this or something else based on cluster stability be a more suitable (and easier to explain and more efficiently computable) metric than DBCV?

lmcinnes commented 6 years ago

It is a measure of how stable the given clusters are, but that isn't necessarily the same thing as clustering quality -- you could, in principle, have a very stable mis-clustering. All that said, I do believe that this is a not unreasonable metric all things considered. I don't think it carries the same theoretical weight as the other options, but if you just want something to work in practice it may be "good enough".