Open Garfiled opened 6 years ago
Ultimately it is because the two major cluster split before the singleton point split (i.e. the singleton point was closer to a cluster than the clusters were to each other). Things can go a little oddly with density based clustering with so little data, since it gives very little clear indication of the actual densities.
import hdbscan
points = []
/ (close point1 point2 point3) (point4) (close point5 point6 point7)
/ points.append([116.286932,40.055431]) points.append([116.286905,40.055411]) points.append([116.286905,40.055421])
points.append([116.289789,40.055859])
points.append([116.289789,40.055865])
points.append([116.291487,40.056122]) points.append([116.291549,40.056191]) points.append([116.291567,40.056177])
clusterer = hdbscan.HDBSCAN(min_cluster_size=2,metric='haversine') cluster_labels = clusterer.fit_predict(points)
points = [str(p[0])+","+str(p[1]) for p in points]
print ";".join(points) print cluster_labels
The results: 116.286932,40.055431;116.286905,40.055411;116.286905,40.055421;116.289789,40.055865;116.291487,40.056122;116.291549,40.056191;116.291567,40.056177 [0 0 0 1 1 1 1]
I just want to ask why hdbscan fix the label [0 0 0 1 1 1 1] why not [0 0 0 -1 1 1 1] or [0 0 0 1 2 2 2]