Open codata-hg opened 6 years ago
One needs to compute nearest neighbors of the new data points so one can approximate the core distance of the points -- that generally requires trees, or some other nearest neighbor technique. Precomputed distance matrices don't work so well for that. Just computing the exemplars is possible, but all that code got wrapped up together for now. You can effectively reproduce the exemplar computation yourself using the condensed tree representation if you wish. You simply want the set of the most persistent points for each selected cluster.
Thanks for you reply. That makes sense to me. Like you said, I followed How Soft Clustering for HDBSCAN Works and had the exemplars reproduced successfully.
But I still think it would be great to have clusterer.exemplars_
available, maybe by separating exemplars generation out from prediction. Shouldn't be hard.
I would be very happy to recieve a pull request -- I don't think it is too hard, but I don't have time to work on it right now. If you can make it work that would be great!
On Fri, Nov 9, 2018 at 6:42 PM codata-hg notifications@github.com wrote:
Thanks for you reply. That makes sense to me. Like you said, I followed How Soft Clustering for HDBSCAN Works https://hdbscan.readthedocs.io/en/latest/soft_clustering_explanation.html and had the exemplars reproduced successfully.
But I still think it might be better to separate exemplars generation out from prediction, and make clusterer.exemplars_ available. Shouldn't be hard. I find HDBSCAN really awesome, I'd love to contribute if it's needed.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/251#issuecomment-437529770, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBQd2Vc2ksI2bD4LqexpH80eWtxheks5uthLLgaJpZM4YVXDL .
HDBSCAN is a really awesome clustering technique! I'd love to make any contribution. Even though I cannot have a guaranteed timeline for it, I'll try to make it.
Thanks, anything you can manage is greatly appreciated.
When I used a precomputed distance matrix, and try to get the exemplars of the clusterer by
clusterer.exemplars_
I have the following error message 'AttributeError: Currently exemplars require the use of vector input data with a suitable metric. This will likely change in the future, but for now no exemplars can be provided'. I don't understand why exemplar has to be generated in prediction part which rely on KDTree or BallTree, if clustering is already done. Any idea? Thanks