scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.8k stars 503 forks source link

Clustering struggles with mix of noise levels #610

Open AlanDeBarros opened 1 year ago

AlanDeBarros commented 1 year ago

Hello

I have images where some parts change intensity with time. I use this fact to clusterize those parts with HDBSCAN. Each pixel is a vector, euclidian distance.

I get an issue when among the parts, there is one that stays black the whole time. I get this part as a cluster, and then lots of tiny clusters surrounded by noise. I guess this is because this part of the image makes a very stable cluster by hdbscan standards, which raises the level for other clusters.

Is there a way to get two noise floors or make it more dynamic somehow ?

lmcinnes commented 1 year ago

Not out of the box no; I think you might need to do some filtering as preprocessing, or possibly come up with a custom distance metric that might do the job.

On Tue, Aug 29, 2023 at 11:00 AM AlanDeBarros @.***> wrote:

Hello

I have images where some parts change intensity with time. I use this fact to clusterize those parts with HDBSCAN. Each pixel is a vector, euclidian distance.

I get an issue when among the parts, there is one that stays black the whole time. I get this part as a cluster, and then lots of tiny clusters surrounded by noise. I guess this is because this part of the image makes a very stable cluster by hdbscan standards, which raises the level for other clusters.

Is there a way to get two noise floors or make it more dynamic somehow ?

— Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IUBJ7J3A3DRLERRJCTRLXXX7ZHANCNFSM6AAAAAA4DDE4ZU . You are receiving this because you are subscribed to this thread.Message ID: @.***>