scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.8k stars 501 forks source link

Shared nodes between clusters #465

Closed adminy closed 3 years ago

adminy commented 3 years ago

HDBScan seems to be capable of producing clusters which share overlapping nodes, given that clustering for me is to identify shared points between clusters, what would I have to do to the algorithm to get those?

lmcinnes commented 3 years ago

HDBSCAN is a hierarchical clustering algorithm, and as such only produces strict hierarchies of clusters: if two clusters overlap it implies that one cluster is strictly contained within the other. It does not support overlapping clusters in the way I believe you want. To do that you would need a different algorithm altogether unfortunately, and I do not know of any particularly good clustering algorithms that support this kind of overlapping clusters.

adminy commented 3 years ago

mhm, thanks @lmcinnes