Open mratsim opened 7 years ago
Thanks for this, it's definitely a bug. The prediction apparatus is still new, so this is exactly the sort of things I need to hammer out. I'll try and get this corrected for you shortly.
On Sat, Feb 25, 2017 at 8:37 AM, Mamy Ratsimbazafy <notifications@github.com
wrote:
When trying to use both the haversine metric and prediction data I get the error: ValueError: metric HaversineDistance is not valid for KDTree
It seems that scikit-learn
Steps to reproduce, coords is a latitude,longitude dataframe:
db = HDBSCAN(min_samples=1, metric='haversine', core_dist_n_jobs=-1, prediction_data=True )
db.fit(coords)
Error:
---------------------------------------------------------------------------ValueError Traceback (most recent call last)
in ()----> 1 db.fit(coords) /usr/lib/python3.6/site-packages/hdbscan/hdbscan_.py in fit(self, X, y) 855 856 if self.prediction_data:--> 857 self.generate_predictiondata() 858 859 return self /usr/lib/python3.6/site-packages/hdbscan/hdbscan.py in generate_prediction_data(self) 889 self._raw_data, self.condensedtree, min_samples, 890 tree_type='kdtree', metric=self.metric,--> 891 self._metric_kwargs 892 ) 893 else: /usr/lib/python3.6/site-packages/hdbscan/prediction.py in init(self, data, condensed_tree, min_samples, tree_type, metric, kwargs) 99 self.raw_data = data 100 self.tree = self._tree_type_map[tree_type](self.raw_data,--> 101 metric=metric, kwargs) 102 self.core_distances = self.tree.query(data, k=min_samples)[0][:, -1] 103 self.dist_metric = DistanceMetric.get_metric(metric, kwargs) sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.init (sklearn/neighbors/kd_tree.c:9328)() ValueError: metric HaversineDistance is not valid for KDTree
I tried changing the algorithm to prims_balltree and boruvka_balltree but to no avail.
I found the issue at line 890 of hdbscan_.py, with the tree type hardcoded to kdtree.
/usr/lib/python3.6/site-packages/hdbscan/hdbscan_.py in generate_prediction_data(self) 889 self._raw_data, self.condensedtree, min_samples, 890 tree_type='kdtree', metric=self.metric,
The PredictionData in prediction.py supports balltree, I confirmed it works. I am not sure of the implication of changing the default from kdtree to balltree.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/88, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBUzkm2_F36FoA05tOzR-OSBi7Q0zks5rgC6CgaJpZM4MMA4p .
Should be fixed now. Sorry about that!
When trying to use both the haversine metric and prediction data I get the error:
ValueError: metric HaversineDistance is not valid for KDTree
Steps to reproduce, coords is a latitude,longitude dataframe:
Error:
I tried changing the default algorithm from
best
toprims_balltree
andboruvka_balltree
but to no avail.I found the issue at line 890 of hdbscan_.py, with the tree type hardcoded to
kdtree
.The
PredictionData
inprediction.py
supportsballtree
, I confirmed it works and I can now use the (undocumented)approximate_predict
function.I am not sure of the implication of changing the default from
kdtree
toballtree
.