scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.73k stars 491 forks source link

Straightforward way to assign every noise sample to its most likely cluster? #640

Open Asquator opened 1 month ago

Asquator commented 1 month ago

My application requires total clustering of all data samples, and I would like to assign all outliers to their adjacent clusters (the dataset is very noisy, and after tweaking the two parameters, at least 1/4 of the samples are marked as outliers).

I want to benefit from the advantages of density-based clustering, but also make deterministic decision based on every point's (approximate) cluster.

It seems we just need to assign every outlier to its closest core point's cluster, what is the easiest way to do it?

lmcinnes commented 1 month ago

You can try the soft clustering options: https://hdbscan.readthedocs.io/en/latest/soft_clustering.html but there really isn't a magical straightforward way to do this.

On Sun, Jun 9, 2024 at 10:23 PM Asquator @.***> wrote:

My application requires total clustering of all data samples, and I would like to assign all outliers to their adjacent clusters (the dataset is very noisy, and after tweaking the two parameters, at least 1/4 of the samples are marked as outliers).

I want to benefit from the advantages of density-based clustering, but also make deterministic decision based on every point's (approximate) cluster.

It seems we just need to assign every outlier to its closest core point's cluster, what is the easiest way to do it?

— Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IUBP4B5ZPAAXHIFL5J3LZGUE3XAVCNFSM6AAAAABJBNMBQOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DENRZG42DAOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>