Thank you for this amazing clustering algorithm and such easy-to-use library. However I think I've found a minor bug. min_cluster_size keyword actually stands for a maximum size, which is not considered a cluster. See example below:
data['Cluster2'] = hdbscan.HDBSCAN(min_cluster_size=2).fit_predict(data[['x', 'y', 'z']])
data['Cluster3'] = hdbscan.HDBSCAN(min_cluster_size=3).fit_predict(data[['x', 'y', 'z']])
gb2 = data.groupby('Cluster2')
l = np.nan
for n, cluster in gb2:
l = np.nanmin([l, cluster.shape[0]])
print l
gb3 = data.groupby('Cluster3')
l = np.nan
for n, cluster in gb3:
l = np.nanmin([l, cluster.shape[0]])
print l
Hi,
Thank you for this amazing clustering algorithm and such easy-to-use library. However I think I've found a minor bug. min_cluster_size keyword actually stands for a maximum size, which is not considered a cluster. See example below:
Prints out 3.0 4.0
(I use the latest version available through pip)