Unexpected Clustering Result with Given Parameters - Githubissues

scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.

http://hdbscan.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

2.81k stars 506 forks source link

Unexpected Clustering Result with Given Parameters #643

Open yata0 opened 4 months ago

yata0 commented 4 months ago

I've encountered an unexpected behavior in the clustering of my text corpus. The corpus contains the following texts:

text1
text2
text3 Based on the distance metric used, it is observed that dist(text1, text2) < dist(text1, text3). However, the clustering algorithm has grouped text1 with text3 and identified text2 as an isolated point. This result is contrary to my expectations.

Parameters Used:

min_cluster_size = 2
min_samples = 2
distance: Euclidean distance