rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.26k stars 535 forks source link

[BUG] HDBSCAN precomputed metric AttributeError #4460

Open RyanWalden20 opened 2 years ago

RyanWalden20 commented 2 years ago

Describe the bug cuml.cluster.HDBSCAN does not appear to support precomputed distance matrix as the documentation suggests

Steps/Code to reproduce bug

from cuml.cluster import HDBSCAN
model = HDBSCAN(metric='precomputed')
model.fit(np.ones((3,3)))
AttributeError                            Traceback (most recent call last)
<command-3589287876064841> in <module>
      1 from cuml.cluster import HDBSCAN
      2 model = HDBSCAN(metric='precomputed')
----> 3 model.fit(np.ones((3,3)))

/databricks/python/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/cluster/hdbscan.pyx in cuml.cluster.hdbscan.HDBSCAN.fit()

cuml/common/base.pyx in cuml.common.base.Base.__getattr__()

Expected behavior The model to fit using the precomputed distance matrix

Environment details (please complete the following information):

RyanWalden20 commented 2 years ago

Verified this error is reproducible in the environment of the Rapids Getting Started Colab notebook at https://rapids.ai/start.html

RichieHakim commented 2 years ago

I'm getting this error too. I accidentally made a duplicate issue (though slightly different). https://github.com/rapidsai/cuml/issues/4475 Also, using metric = 'manhattan' also crashes my kernel.

cjnolet commented 2 years ago

As mentioned in #4475, the current documentation is indeed misleading as it was copied directly from the hdbscan library even though the only supported metric currently is euclidean. We need to update the documentation for now but there's a pending item in #3879 to expose more metrics through hdbscan (and single-linkage hierarchical clustering).

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

CholoTook commented 5 months ago

Can I somehow make a euclidean space from a cosign space and feed it to hdbscan?