scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.78k stars 497 forks source link

"AttributeError: No prediction data was generated" for flat.HDBSCAN_flat for pairwise distance #581

Open Calvinnncy opened 1 year ago

Calvinnncy commented 1 year ago

Hi all,

Thank you for doing such a great work creating this library. It is my favorite clustering algorithm and I really enjoy using it. I recently run into this error that I do not know how to solve. I am trying to run HDBSCAN_flat with pairwise cosine distance. Would really appreciate it if you can give me some pointers on how to solve this error.

Following is my code that led me to the error.

pos_pw = pairwise_distances([e[0] for e in eud_positive_embeds], metric='cosine').astype('float64') pos_clusterer = HDBSCAN(metric='precomputed').fit(pos_pw) pos_flat_clusterer = flat.HDBSCAN_flat(pos_pw, clusterer=pos_clusterer, n_clusters=30, inplace=True)

Error message: ` AttributeError Traceback (most recent call last) /tmp/ipykernel_342107/3077653091.py in <cell line: 2>() 1 #pos_flat_clusterer = flat.HDBSCAN_flat(pos_pw, clusterer=pos_clusterer, n_clusters=30, inplace=True) ----> 2 flat.approximate_predict_flat(pos_clusterer, pos_pw, 30)

~/env/lib64/python3.10/site-packages/hdbscan/flat.py in approximate_predict_flat(clusterer, points_to_predict, n_clusters, cluster_selection_epsilon, prediction_data, return_prediction_data) 290 # then build prediction data from these by modifying clusterer's 291 if not isinstance(prediction_data, PredictionData): --> 292 if clusterer.predictiondata is None: 293 raise ValueError( 294 'Clusterer does not have prediction data!'

~/env/lib64/python3.10/site-packages/hdbscan/hdbscan_.py in predictiondata(self) 1367 def predictiondata(self): 1368 if self._prediction_data is None: -> 1369 raise AttributeError("No prediction data was generated") 1370 else: 1371 return self._prediction_data

AttributeError: No prediction data was generated ` Thank you.

Best regards, Calvinn

vaibhav-k commented 9 months ago

Hello Calvinn,

Thank you for this post, I was struggling with the same issue and was wondering if I was the only one.

I found a way to resolve this issue, pass the True parameter to the argument prediction_data while creating the HDBSCAN clustering object like so hdbscan.HDBSCAN(prediction_data=True).

I was able to resolve this problem with the aforementioned fix and hope that it works for you, too.

Please let me know how it goes.

Cheers! Vaibhav