Observed
The HDBSCAN flat module that is documented here is supposed to return a fixed number of clusters controlled by the n_clusters parameter. I came across a sample where it returns more than the requested number of clusters.
Expected
HDBSCAN flat must return exactly n_clusters for all inputs.
Code and data
Here is a simple dataset for which HDBSCAN returns more than n_clusters -> data.csv
Here is the code
import pandas as pd
from hdbscan import flat
df = pd.read_csv("data.csv")
clustering = flat.HDBSCAN_flat(df, min_samples=2, min_cluster_size=2, n_clusters=3)
print(set(clustering.labels_))
This prints {0, 1, 2, 3, -1} i.e. four clusters 0, 1, 2, and 3.
Observed The HDBSCAN flat module that is documented here is supposed to return a fixed number of clusters controlled by the
n_clusters
parameter. I came across a sample where it returns more than the requested number of clusters.Expected HDBSCAN flat must return exactly
n_clusters
for all inputs.Code and data Here is a simple dataset for which HDBSCAN returns more than
n_clusters
-> data.csvHere is the code
This prints
{0, 1, 2, 3, -1}
i.e. four clusters0
,1
,2
, and3
.