scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.79k stars 500 forks source link

DBSCAN hybrid mode with the cluster_selection_epsilon does not support soft clustering on out of sample data #526

Open Dicksonchin93 opened 2 years ago

Dicksonchin93 commented 2 years ago

DBSCAN hybrid mode with the cluster_selection_epsilon parameter set to a value more than 0 does not support soft clustering on out of sample data

We don't utilise cluster_selection_epsilon anywhere in the membership_vector method in https://github.com/scikit-learn-contrib/hdbscan/blob/4c432505f4a92884a64a77159664f041a583fbec/hdbscan/prediction.py#L518

The suggested part to add support for that is to add the same logic during fitting with cluster_selection_epsilon is in the select_clusters method used here https://github.com/scikit-learn-contrib/hdbscan/blob/4c432505f4a92884a64a77159664f041a583fbec/hdbscan/prediction.py#L550

https://github.com/scikit-learn-contrib/hdbscan/blob/4c432505f4a92884a64a77159664f041a583fbec/hdbscan/plots.py#L234

lmcinnes commented 2 years ago

Yes, I believe this is an interaction of features that is not going to manage to work. Sorry.

Dicksonchin93 commented 2 years ago

i'll be happy to make a PR if you will be able to review it once it is done, should I do that?