scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.81k stars 506 forks source link

Prediction of new points when feeding in a precompute euclidean distance matrix while training #590

Open jayshah1397 opened 1 year ago

jayshah1397 commented 1 year ago

Hi, I’m using pre-computed distances between points (euclidean distances) for training the hdbscan clustering model (by passing metric='precomputed'). But when I want to predict the clusters and probabilities for new points (for example, test set) using the approximate_predict() function, I cannot use the raw feature values as is.

In the case for prediction, do I need to compute euclidean distances of every new point with the set of data points I used to train the model? Would that be the right approach?