scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.81k stars 507 forks source link

Support p values other than 2 for minkowski #637

Open smartIU opened 6 months ago

smartIU commented 6 months ago

Suggested quick fix to use p values other than 2 for minkowski. Setting _metric_kwargs in init is for use in generate_prediction_data() and weighted_cluster_medoid().

lmcinnes commented 6 months ago

This seems to break the boruvka KDTrees, which don't seem to support taking a p value. You may need a further workaround (use ball trees) in that case.

smartIU commented 6 months ago

Thanks for the info. I've been testing with hdbscan 0.8.33, Cython 0.29.37, scikit-learn 1.4.1.post1, scipy 1.12.0 and numpy 1.24.3. Here _hdbscan_boruvka_kdtree() definitely works, outputting different results for different p values.

Will set up a new environment and find out why it fails in newer versions.