Open IlyaOrson opened 7 years ago
I believe it is a relatively fundamental obstruction at the present time. There may be some cases where it could be made to work, but I would have to think carefully about how best to build an API that would allow for that without being confusing for all the other cases. Sorry that I can't provide any better answers at this time.
No rush at all, I will stay tuned. Thanks for this again!
Hi! Thank you for all your work!
Is it the same with callables? Because I tried to execute the following code:
def userdist(x, y):
distance = vincenty((x[0], x[1]), (y[0], y[1]), miles=True)
return distance
clusterer = hdbscan.HDBSCAN(min_cluster_size=6,
min_samples=3,
metric=userdist,
prediction_data=True).fit(data[['latitude', 'longitude']])
I don't have any warning, but when I call all_points_membership_vectors(clusterer)
on it, I notice that clusterer.prediction_data_
is None.
The error I have is the following:
/Users/nlassaux/hdbscan-clustering/env/lib/python2.7/site-packages/hdbscan/prediction.pyc in all_points_membership_vectors(clusterer)
514 clusters = np.array(list(clusterer.condensed_tree_._select_clusters()
515 )).astype(np.intp)
--> 516 all_points = clusterer.prediction_data_.raw_data
517
518 distance_vecs = all_points_dist_membership_vector(all_points,
AttributeError: 'NoneType' object has no attribute 'raw_data'
Can you explain why a custom metric is a special case for getting a soft clustering?
The soft clustering is still fairly new, and I haven't pushed everything through properly. For now I'm making heavy use of sklearn's KDTree and BallTree, and while they support custom metrics they aren't explicitly cited in the allowed metrics, which is the easiest way to check if they can reasonably be used. That means that the algorithm falls back to other approaches, which don't support the soft clustering at this time.
If you could add an issue with a feature request to ensure that callable metrics are supported for soft clustering I would appreciate it -- it will help stop this falling through the cracks later.
Hello,
Thank you for developing such a great clustering library.
It would be really useful to have this feature available in the next release of hdbscan.
It would be great to have either the ability to use approximate_predict or membership_vector for a custom distance measure or being able to use the same methods for a pairwise_distance input.
Could I ask if there are any plans for this functionality to be added?
Thank you, Elena
My current priorities are in developing a follow on clustering library that benefits from some newer theory and a lot of lessons learned from this library. Particularly when it comes to soft clustering this is very much the case. That means that in practice I do not have any near term plans to add such functionality myself. I would be more than happy to accept pull requests that add such functionality.
Hi @lmcinnes ,
Was there any progress into getting prediction working for precomputed distances?
Hello,
Any progress on getting prediction working with precomputed distance? I calculated cosine distance since it is not supported, but when I try predicting it does not work.
It is unlikely to be available for precomputed distances any time soon. Sorry.
I get this same error
UserWarning: Cannot generate prediction data for non-vectorspace inputs -- access to the source data ratherthan mere distances is required!
when using sqeuclidean
as my distance metric. Is that to be expected @lmcinnes? I'm guessing under the hood any of the scipy
distances are just doing the same thing and calculating a pre-computed metric?
Checking again on the status (hopefully progress) on that thread- namely, using fuzzy/soft clustering with the precomputed distance matrix...
Hello! First of all, thanks a lot for this clustering method and the implementation, both are super cool!
I am trying to use Soft Clustering with the precomputed distance matrix since I am using an unconventional distance. There appears to be no method implemented for this right now. I understand this is a new experimental feature and wondered if this limitation is just temporary. Is it possible to add this functionality?
Just for reference, the following code build from the manual warns this: