Open Mortom123 opened 2 years ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Hi,
Are there any updates about making precomputed matrixes available for HDBSCAN?
Just attempted to perform HDBSCAN on a cupyx.scipy.sparse._csr.csr_matrix
and received immediate complaints upon the sparse input:
hdb.fit(D)
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "hdbscan.pyx", line 762, in cuml.cluster.hdbscan.hdbscan.HDBSCAN.fit
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/input_utils.py", line 380, in input_to_cuml_array
arr = CumlArray.from_input(
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 1114, in from_input
arr = cls(X, index=index, order=requested_order, validate=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 292, in __init__
new_data = cur_xpy.asarray(data, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cupy/_creation/from_data.py", line 88, in asarray
return _core.array(a, dtype, False, order, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 2379, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2406, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2541, in cupy._core.core._array_default
ValueError: setting an array element with a sequence.
python-BaseException
As I notice there's a SparseCumlArray
class but surprisingly HDBSCAN does not buy it either:
hdb.fit(D)
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "hdbscan.pyx", line 762, in cuml.cluster.hdbscan.hdbscan.HDBSCAN.fit
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/input_utils.py", line 380, in input_to_cuml_array
arr = CumlArray.from_input(
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 1114, in from_input
arr = cls(X, index=index, order=requested_order, validate=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 292, in __init__
new_data = cur_xpy.asarray(data, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cupy/_creation/from_data.py", line 88, in asarray
return _core.array(a, dtype, False, order, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 2379, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2406, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2541, in cupy._core.core._array_default
TypeError: float() argument must be a string or a real number, not 'SparseCumlArray'
python-BaseException
Looking forward to any suggestion or support schedule for this, as precomputed, sparse distance matrices are common in clustering algorithms.
@cjnolet , what would it take to get this made and merged in? I'm happy to take a shot at it, no promises as to how far I get. But I'm working with some data right now that would very much benefit from 'cosine', and failing that, 'precompute' is a good option to get a lot of different metrics working.
I would just need some guidance on where to start.
Sometimes we do not have point representations in space but rather only distances between those points. Therefore it would be great if some algorithms (I'm especially interested in HDBSCAN and Agglomerative Clustering) are able to work on precomputed (sparse) distance matrices, similar to using "precomputed" metric in a lot of sklearn algorithms.
Personally, I'm working with biological, structural data, hence I only have differences in structure but not points in space.
There are several issues that also relate to this FEA - #4475 #4460 (#1192, #4409), and the implementation for e.g. DBSCAN already happened with issue #3302.