rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[BUG] Dask KNeighborsClassifier fit fails on cupy backed dask array #3663

Open beckernick opened 3 years ago

beckernick commented 3 years ago

In the 2021-03-26 nightly (and perhaps earlier), cuml.dask.neighbors.KNeighborsClassifier fails during fit on dask array input if the arrays are backed by CuPy.

from distributed import Client, wait
from dask_cuda import LocalCUDACluster
import dask.array as da

from sklearn.datasets import make_classification
import cupy as cp
import numpy as np

from cuml.dask.neighbors import KNeighborsClassifier

cluster = LocalCUDACluster(
    CUDA_VISIBLE_DEVICES="0"
)
client = Client(cluster)

X, y = make_classification(n_samples=10000)
dX = da.from_array(X).map_blocks(cp.asarray)
dy = da.from_array(y).map_blocks(cp.asarray)
​
clf = KNeighborsClassifier()
clf.fit(dX, dy)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b844b5a8db48> in <module>
      4 
      5 clf = KNeighborsClassifier()
----> 6 clf.fit(dX, dy)
      7 clf.predict(dX)

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/cuml/dask/neighbors/kneighbors_classifier.py in fit(self, X, y)
    106                     uniq_labels.append(y.iloc[:, i].unique())
    107 
--> 108         uniq_labels = da.compute(uniq_labels)[0]
    109         if not isinstance(uniq_labels[0], np.ndarray):  # for cuDF Series
    110             uniq_labels = list(map(lambda x: x.values_host, uniq_labels))

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    563         postcomputes.append(x.__dask_postcompute__())
    564 
--> 565     results = schedule(dsk, keys, **kwargs)
    566     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    567 

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2652                     should_rejoin = False
   2653             try:
-> 2654                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2655             finally:
   2656                 for f in futures.values():

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   1967                 direct=direct,
   1968                 local_worker=local_worker,
-> 1969                 asynchronous=asynchronous,
   1970             )
   1971 

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    836         else:
    837             return sync(
--> 838                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    839             )
    840 

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    349     if error[0]:
    350         typ, exc, tb = error[0]
--> 351         raise exc.with_traceback(tb)
    352     else:
    353         return result[0]

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/distributed/utils.py in f()
    332             if callback_timeout is not None:
    333                 future = asyncio.wait_for(future, callback_timeout)
--> 334             result[0] = yield future
    335         except Exception as exc:
    336             error[0] = sys.exc_info()

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/tornado/gen.py in run(self)
    760 
    761                     try:
--> 762                         value = future.result()
    763                     except Exception:
    764                         exc_info = sys.exc_info()

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1826                             exc = CancelledError(key)
   1827                         else:
-> 1828                             raise exception.with_traceback(traceback)
   1829                         raise exc
   1830                     if errors == "skip":

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/optimization.py in __call__()
    961         if not len(args) == len(self.inkeys):
    962             raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 963         return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
    964 
    965     def __reduce__(self):

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/core.py in get()
    149     for key in toposort(dsk):
    150         task = dsk[key]
--> 151         result = _execute_task(task, cache)
    152         cache[key] = result
    153     result = _execute_task(out, cache)

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/core.py in _execute_task()
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/core.py in <genexpr>()
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/core.py in _execute_task()
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/utils.py in apply()
     33 def apply(func, args, kwargs=None):
     34     if kwargs:
---> 35         return func(*args, **kwargs)
     36     else:
     37         return func(*args)

/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326/lib/python3.7/site-packages/dask/array/routines.py in _unique_internal()
   1001 
   1002     r = np.empty(u.shape, dtype=dt)
-> 1003     r["values"] = u
   1004     if return_inverse:
   1005         r["inverse"] = np.arange(len(r), dtype=np.intp)

cupy/core/core.pyx in cupy.core.core.ndarray.__array__()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
conda list | grep "rapids\|dask\|numpy\|cupy"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210326:
cudf                      0.19.0a210326   cuda_10.2_py37_gb0e350b205_287    rapidsai-nightly
cuml                      0.19.0a210326   cuda10.2_py37_g0883026bf_128    rapidsai-nightly
cupy                      8.5.0            py37h97f80e5_1    conda-forge
dask                      2021.3.0           pyhd8ed1ab_0    conda-forge
dask-core                 2021.3.0           pyhd8ed1ab_0    conda-forge
dask-cuda                 0.19.0a210326           py37_44    rapidsai-nightly
dask-cudf                 0.19.0a210326   py37_gb0e350b205_287    rapidsai-nightly
dask-glm                  0.2.1.dev52+g1daf4c5           dev_0    <develop>
libcudf                   0.19.0a210326   cuda10.2_gb0e350b205_287    rapidsai-nightly
libcuml                   0.19.0a210326   cuda10.2_g0883026bf_128    rapidsai-nightly
libcumlprims              0.19.0a210316   cuda10.2_ge7e82a0_12    rapidsai-nightly
librmm                    0.19.0a210326   cuda10.2_g1de6b83_49    rapidsai-nightly
numpy                     1.19.5           py37haa41c4c_1    conda-forge
rmm                       0.19.0a210326   cuda_10.2_py37_g1de6b83_49    rapidsai-nightly
ucx                       1.9.0+gcd9efd3       cuda10.2_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.19.0a210326   py37_gcd9efd3_46    rapidsai-nightly
divyegala commented 3 years ago

@viclafargue and take a look at this one too?

viclafargue commented 3 years ago

Yes, some operations on Dask arrays do not work when the array is backed by CuPy. This is apparently the case with the unique operation. I think the best option would be to make these operations work in Dask.

beckernick commented 3 years ago

Depending on the compatibility requirements, we could potentially update the internal unique function to use np.empty_like or NEP-35

beckernick commented 3 years ago

https://github.com/dask/dask/issues/7482

viclafargue commented 3 years ago

For the 0.19 release, we will probably just raise an exception with an informative message when the MNMG KNN Classifier is given a Dask array backed by CuPy.

beckernick commented 3 years ago

XGBoost is failing similarly (cross-referencing for additional tracking) https://github.com/dmlc/xgboost/issues/6820

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

trivialfis commented 1 month ago

The issue seems to be resolved with:

>>> dask.__version__
'2024.5.1'
>>> cupy.__version__
'13.2.0'