rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.17k stars 527 forks source link

[BUG] KNeighborsClassifier predict crashes for medium/large datasets #2491

Closed Banus closed 4 years ago

Banus commented 4 years ago

Describe the bug When using 20k or more samples to fit a KNeighborsClassifier, training goes on smoothly but the method predict crashes with the following message:

  File "cuml/neighbors/kneighbors_classifier.pyx", line 228, in cuml.neighbors.kneighbors_classifier.KNeighborsClassifier.predict
RuntimeError: Exception occured! file=/conda/conda-bld/libcuml_1591208841859/work/cpp/src_prims/selection/knn.cuh line=519: FAIL: call='cudaPeekAtLastError()'. Reason:invalid argument

Steps/Code to reproduce bug

from cuml.neighbors import KNeighborsClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

X, y = make_blobs(n_samples=int(2e4), centers=5, n_features=248)
knn = KNeighborsClassifier(n_neighbors=3)

X_train, X_test, y_train, y_test =\
  train_test_split(X, y, train_size=0.80)
knn.fit(X_train, y_train)
knn.predict(X_test[:1, :])

Expected behavior predict should return the predictions for the given samples.

Environment details (please complete the following information):

Additional context Tried on a clean conda environment; also in the latest RAPIDS 0.15 nightly build.

cjnolet commented 4 years ago

@Banus

Thank you for opening this issue. I have verified this crash also occurs in our pytests when the data sizes are set to 20k rows and 248 cols.