Hi, I’m using pre-computed distances between points (euclidean distances) for training the hdbscan clustering model (by passing metric='precomputed'). But when I want to predict the clusters and probabilities for new points (for example, test set) using the approximate_predict() function, I cannot use the raw feature values as is.
In the case for prediction, do I need to compute euclidean distances of every new point with the set of data points I used to train the model?
Would that be the right approach?
Hi, I’m using pre-computed distances between points (euclidean distances) for training the hdbscan clustering model (by passing metric='precomputed'). But when I want to predict the clusters and probabilities for new points (for example, test set) using the approximate_predict() function, I cannot use the raw feature values as is.
In the case for prediction, do I need to compute euclidean distances of every new point with the set of data points I used to train the model? Would that be the right approach?