Open hammadmazhar1 opened 5 years ago
I think this might have happened due the cluster selection method. I switched to eom
which works fine (aka it gives me a cluster, "0"). But I still receive the Clusterer does not have any defined clusters, new data will automatically predicted as noise
warning.
Hi, Even I'm getting the below warning when using approximate_predict() function. "UserWarning: Clusterer does not have any defined clusters, new data will be automatically predicted as noise."
My fit data has only one cluster (cluster 0) and outliers (-1). clusterer = hdbscan.HDBSCAN(metric='euclidean',cluster_selection_method='eom',allow_single_cluster=True,prediction_data=True).fit(x) I'm trying to use Hdbscan to find outliers and save the model to predict new data points using approximate_predict().
Is there any way to do it?
Thanks.
Clustering on a large amount of network data, I have hit upon a case where I essentially have the same data point, just repeated multiple times (600 points or so) as the set of data to cluster on. This should lead to a single cluster in practice due to zero distance between points (unless I am severly misunderstanding the principle HDBSCAN works on). However, fitting on this data with the
allow_single_cluster=True
option, returns the warning:Clusterer does not have any defined clusters, new data will be automatically predicted as noise.
. I plan to use this to classify new data, this is obviously not the right outcome for me.Any suggestions? I'm currently building the clusterer with:
clusterer = hdbscan.HDBSCAN(algorithm='boruvka_balltree',memory=mem_cache,core_dist_n_jobs=5,metric='manhattan',min_cluster_size=min_clust_size,min_samples=min_samp,prediction_data=True,allow_single_cluster=True,cluster_selection_method='leaf')