[BUG] cuML KNN Classifier gives lower results when comapred to sklearn KNN Classifier

Hadi-94 commented 2 years ago

Describe the issue I have been comparing KNeighborsClassifier from both libraries, Sklearn and cuML (Python) on my project and I have noticed that cuML KNeighborsClassifier shows lower results when is compared to sklearn KNeighborsClassifier.

Steps/Code to reproduce the issue

The dataset used has 17 features, 274628 entries, and 2 classifications (0 and 1). The dataset has been preprocessed as followed: 1- Changed NaN values to zeros. 2- Replaced specific feature's dtype from object to float32, or int. 3- Dataset has been splitted using train_test_split() from sclearn library.

df.info() of the dataset (after preprocessing) that I'm using is shown in the photo below

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 274628 entries, 0 to 274627
Data columns (total 18 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   address               274628 non-null  float32
 1   function              274628 non-null  float32
 2   length                274628 non-null  float32
 3   setpoint              274628 non-null  float32
 4   gain                  274628 non-null  float32
 5   reset rate            274628 non-null  float32
 6   deadband              274628 non-null  float32
 7   cycle time            274628 non-null  float32
 8   rate                  274628 non-null  float32
 9   system mode           274628 non-null  float32
 10  control scheme        274628 non-null  float32
 11  pump                  274628 non-null  float32
 12  solenoid              274628 non-null  float32
 13  pressure measurement  274628 non-null  float32
 14  crc rate              274628 non-null  float32
 15  command response      274628 non-null  float32
 16  time                  274628 non-null  float32
 17  binary result         274628 non-null  int64   
dtypes: float32(17), int64(1)
memory usage: 24.1+ MB
time: 41.8 ms (started: 2021-12-18 13:46:35 +00:00)

In the comparision script: 1- The dataset has been passed through a pipeline that uses MinMaxScaler() function as a normalization technique, and SMOTE() function as an oversmapling technqiue to oversample the training part of the dataset. 2- Both algorthims have been tested using a function that implements StratifiedKFold() and cross_validate() techniques to have a more comprehensive result. 3- The parameters for both algorithms match each other.

My testing function code is shown below:

# Function (Script) to test KNNsklearn and KNNcuml
def run_exps(X_train: pd.DataFrame , y_train: pd.DataFrame, X_test: pd.DataFrame, y_test: pd.DataFrame) -> pd.DataFrame:

  # Lightweight script to test many models and find winners
  # :param X_train: training split
  # :param y_train: training target vector
  # :param X_test: test split
  # :param y_test: test target vector
  # :return: DataFrame of predictions

  dfs = []
  models = [
            # Setting up KNN - sklearn attributes to match KNN - cuml attributes
            # KNN - cuML --> there is no setting for leaf size since the only algortihm used is "brute".
            # KNN - Sklearn --> metric_param are set to "None" by default. 
            # KNN - cuML --> metric_param not available.
            ('KNN - sklearn', KNNsklearn(n_neighbors = 3, weights='uniform', algorithm='brute',  metric='euclidean')),
            ('KNN - cuML', KNNcuml(n_neighbors = 3, weights='uniform', algorithm='brute', metric='euclidean', output_type='input'))
            ]

  results = []
  names = []
  scoring = ['accuracy', 'precision_weighted', 'recall_weighted', 'f1_weighted']

  for name, model in models:

    pipe = Pipeline([
                     ('normalization', MinMaxScaler()),
                     ('oversampling', SMOTE()),
                     ('name', model)
                     ])

    kfold = StratifiedKFold(n_splits=5)
    cv_results = cross_validate(pipe, X_train, y_train, cv=kfold, scoring=scoring, verbose=4)

    clf = model.fit(X_train, y_train)
    y_pred = clf.predict(X_test)

    print('''
    {}
    {}
    {}
    ''' .format(name, classification_report(y_test, y_pred), confusion_matrix(y_test, y_pred)))

    results.append(cv_results)
    names.append(name)
    this_df = pd.DataFrame(cv_results)
    this_df['model'] = name
    dfs.append(this_df)
    final = pd.concat(dfs, ignore_index=True)

  return final

# Loading the dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/dataset/IanDataset.csv")

# Filling missing values with zeros
df = df.fillna(0)

# Replace the data in command responce from being objects to integers
df["command response"].replace({"b'0'": "0", "b'1'": "1"}, inplace=True)
df["binary result"].replace({"b'0'": "0", "b'1'": "1"}, inplace=True)

# Change the datatype of some features to be able to be used later 
cols = df.select_dtypes(include=['float64']).columns
df[cols] = df[cols].astype('float32')
df["command response"] = pd.to_numeric(df["command response"]).astype('float32')
df["binary result"] = pd.to_numeric(df["binary result"]).astype(int)

# Extract features and Targets
X = df.iloc[:, 0:17]
y= df.iloc[:, 17]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)

# Calling the Testing script
run_exps(X_train, y_train, X_test, y_test)

Expected behavior The results obtained from this simple test is as follows:

    KNN - sklearn
                  precision    recall  f1-score   support

           0       0.92      0.95      0.94     42930
           1       0.80      0.71      0.75     11996

    accuracy                           0.90     54926
   macro avg       0.86      0.83      0.84     54926
weighted avg       0.89      0.90      0.90     54926

    [[40798  2132]
 [ 3507  8489]]

    KNN - cuML
                  precision    recall  f1-score   support

           0       0.78      0.93      0.85     42930
           1       0.21      0.07      0.10     11996

    accuracy                           0.74     54926
   macro avg       0.50      0.50      0.48     54926
weighted avg       0.66      0.74      0.69     54926

    [[39935  2995]
 [11196   800]]

We can notice the difference in accuracy, precision, recall and f1-score in which KNN - sklearn has scored higher. When using Confusion Matrix to compare the rsults we can also notice that: The True Negative Instances in KNN - sklearn is higher (sklearn model --> 40798, cuML Model --> 39935). The True Positive Instances in KNN - sklearn is higher (sklearn model --> 8489, cuML model --> 800). The False Positive Instances in KNN -sklearn is lower (sklearn model --> 2132, cuML model --> 2995). The False Negtaive in KNN - sklearn is lower (sklearn model --> 3507, cuML model --> 11196).

Knowing that both models have had the same parameters, the results should be very similar, however, it is not the case here as there is a huge difference in results in temrs of accuracy, precision, recall, f1-score and confusion martrix analysis .

Environment details (please complete the following information):

Environment location: Using Google Colab
GPU Model/Driver: Tesla P100
CUDA: [9.2]

Method of cuDF & cuML install: CondaColab using Rapids.ai Google Colab Tutorial

# packages in environment at /usr/local:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
aiohttp                   3.7.4.post0      py37h5e8e339_0    conda-forge
anyio                     3.4.0            py37h89c1867_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argcomplete               1.12.3             pyhd8ed1ab_2    conda-forge
argon2-cffi               20.1.0           py37h5e8e339_2    conda-forge
arrow-cpp                 5.0.0           py37h83cef64_1_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
async-timeout             3.0.1                   py_1000    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
aws-c-cal                 0.5.11               h95a6274_0    conda-forge
aws-c-common              0.6.2                h7f98852_0    conda-forge
aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
aws-c-io                  0.10.5               hfb6a706_0    conda-forge
aws-checksums             0.1.11               ha31a3da_7    conda-forge
aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
blas                      2.14                   openblas    conda-forge
bleach                    4.1.0              pyhd8ed1ab_0    conda-forge
blinker                   1.4                        py_1    conda-forge
blosc                     1.21.0               h9c3ff4c_0    conda-forge
bokeh                     2.4.0            py37h89c1867_0    conda-forge
boost                     1.74.0           py37h6dcda5c_3    conda-forge
boost-cpp                 1.74.0               h312852a_4    conda-forge
brotli                    1.0.9                h9c3ff4c_4    conda-forge
brotlipy                  0.7.0           py37h5e8e339_1001    conda-forge
brunsli                   0.1                  h9c3ff4c_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h7f98852_1    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cachetools                4.2.4              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2021.10.8        py37h89c1867_1    conda-forge
cffi                      1.14.5           py37hc58025e_0    conda-forge
cfitsio                   3.470                h2e3daa1_7    conda-forge
chardet                   4.0.0            py37h89c1867_1    conda-forge
charls                    2.2.0                h9c3ff4c_0    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
cloudpickle               2.0.0              pyhd8ed1ab_0    conda-forge
colorcet                  3.0.0              pyhd8ed1ab_0    conda-forge
conda                     4.11.0           py37h89c1867_0    conda-forge
conda-package-handling    1.7.2            py37hb5d75c8_0    conda-forge
cryptography              3.4.5            py37h5d9358c_1    conda-forge
cucim                     21.10.00        cuda_11.2_py37_gd7ac21f_0    rapidsai
cudatoolkit               11.2.72              h2bc3f7f_0    nvidia
cudf                      21.10.01        cuda_11.2_py37_ga1d2d13a14_0    rapidsai
cudf_kafka                21.10.01        py37_ga1d2d13a14_0    rapidsai
cugraph                   21.10.00        cuda11.2_py37_g84617024_0    rapidsai
cuml                      21.10.02        cuda11.2_py37_gcd9251271_0    rapidsai
cupy                      9.6.0            py37h07c33ac_0    conda-forge
curl                      7.78.0               hea6ffbf_0    conda-forge
cusignal                  21.10.00        py37_gff14a10_0    rapidsai
cuspatial                 21.10.00        py37_gba20298_0    rapidsai
custreamz                 21.10.01        py37_ga1d2d13a14_0    rapidsai
cuxfilter                 21.10.00        py37_g003d3d6_0    rapidsai
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
cyrus-sasl                2.1.27               h230043b_2    conda-forge
cytoolz                   0.11.0           py37h5e8e339_3    conda-forge
dask                      2021.9.1           pyhd8ed1ab_0    conda-forge
dask-core                 2021.9.1           pyhd8ed1ab_0    conda-forge
dask-cuda                 21.10.00                 py37_0    rapidsai
dask-cudf                 21.10.01        py37_ga1d2d13a14_0    rapidsai
datashader                0.11.1             pyh9f0ad1d_0    conda-forge
datashape                 0.5.4                      py_1    conda-forge
debugpy                   1.4.1            py37hcd2ae1e_0    conda-forge
decorator                 5.1.0              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2021.9.1         py37h89c1867_0    conda-forge
dlpack                    0.5                  h9c3ff4c_0    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
faiss-proc                1.0.0                      cuda    rapidsai
fastavro                  1.4.4            py37h5e8e339_0    conda-forge
fastrlock                 0.6              py37hcd2ae1e_1    conda-forge
fiona                     1.8.20           py37hc1d69b0_1    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
freexl                    1.0.6                h7f98852_0    conda-forge
fsspec                    2021.11.1          pyhd8ed1ab_0    conda-forge
gcsfs                     2021.11.1          pyhd8ed1ab_0    conda-forge
gdal                      3.3.1            py37hb0e9ad2_1    conda-forge
geopandas                 0.9.0              pyhd8ed1ab_1    conda-forge
geopandas-base            0.9.0              pyhd8ed1ab_1    conda-forge
geos                      3.9.1                h9c3ff4c_2    conda-forge
geotiff                   1.6.0                h4f31c25_6    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glog                      0.5.0                h48cff8f_0    conda-forge
google-api-core           2.3.2              pyhd8ed1ab_0    conda-forge
google-auth               2.3.3              pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-cloud-core         2.2.1              pyh6c4a22f_0    conda-forge
google-cloud-storage      1.43.0             pyh6c4a22f_0    conda-forge
google-crc32c             1.1.2            py37hab72019_0    conda-forge
google-resumable-media    2.1.0              pyh6c4a22f_0    conda-forge
googleapis-common-protos  1.53.0           py37h89c1867_1    conda-forge
grpc-cpp                  1.39.0               h36ce80c_1    conda-forge
grpcio                    1.38.1           py37hb27c1af_0    conda-forge
hdf4                      4.2.15               h10796ff_3    conda-forge
hdf5                      1.10.6          nompi_h7c3c948_1111    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
imagecodecs               2021.6.8         py37hd3505d4_0    conda-forge
imageio                   2.13.1             pyhd8ed1ab_0    conda-forge
importlib-metadata        4.9.0            py37h89c1867_0    conda-forge
importlib_metadata        4.9.0                hd8ed1ab_0    conda-forge
importlib_resources       5.4.0              pyhd8ed1ab_0    conda-forge
ipykernel                 6.6.0            py37h6531663_0    conda-forge
ipython                   7.30.1           py37h89c1867_0    conda-forge
ipython-autotime          0.3.1                    pypi_0    pypi
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.5              pyhd8ed1ab_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jedi                      0.18.1           py37h89c1867_0    conda-forge
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
json-c                    0.15                 h98cffda_0    conda-forge
jsonschema                4.3.1              pyhd8ed1ab_0    conda-forge
jupyter-server-proxy      3.2.0              pyhd8ed1ab_0    conda-forge
jupyter_client            7.1.0              pyhd8ed1ab_0    conda-forge
jupyter_core              4.9.1            py37h89c1867_1    conda-forge
jupyter_server            1.13.1             pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_widgets        1.0.2              pyhd8ed1ab_0    conda-forge
jxrlib                    1.1                  h7f98852_2    conda-forge
kealib                    1.4.14               hcc255d8_2    conda-forge
kiwisolver                1.3.1            py37h2527ec5_1    conda-forge
krb5                      1.19.2               hcc1bbae_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
lerc                      2.2.1                h9c3ff4c_0    conda-forge
libaec                    1.0.5                h9c3ff4c_0    conda-forge
libarchive                3.5.1                hccf745f_2    conda-forge
libblas                   3.8.0               14_openblas    conda-forge
libbrotlicommon           1.0.9                h7f98852_5    conda-forge
libbrotlidec              1.0.9                h7f98852_5    conda-forge
libbrotlienc              1.0.9                h7f98852_5    conda-forge
libcblas                  3.8.0               14_openblas    conda-forge
libcrc32c                 1.1.1                h9c3ff4c_2    conda-forge
libcucim                  21.10.00        cuda11.2_gd7ac21f_0    rapidsai
libcudf                   21.10.01        cuda11.2_ga1d2d13a14_0    rapidsai
libcudf_kafka             21.10.01          ga1d2d13a14_0    rapidsai
libcugraph                21.10.00        cuda11.2_g84617024_0    rapidsai
libcuml                   21.10.02        cuda11.2_gcd9251271_0    rapidsai
libcumlprims              21.10.00        cuda11.2_g167dc59_0    nvidia
libcurl                   7.78.0               h2574ce0_0    conda-forge
libcuspatial              21.10.00        cuda11.2_gba20298_0    rapidsai
libdap4                   3.20.6               hd7c4107_2    conda-forge
libdeflate                1.7                  h7f98852_5    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libfaiss                  1.7.0           cuda112h5bea7ad_8_cuda    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgdal                   3.3.1                h8f005ca_1    conda-forge
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libglib                   2.68.3               h3e27bee_0    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libgsasl                  1.8.0                         0    conda-forge
libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libkml                    1.3.0             h238a007_1014    conda-forge
liblapack                 3.8.0               14_openblas    conda-forge
liblapacke                3.8.0               14_openblas    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libnetcdf                 4.8.0           nompi_hcd642e3_103    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libntlm                   1.4               h7f98852_1002    conda-forge
libopenblas               0.3.7                h5ec1e0e_6    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.3                 hd57d9b9_0    conda-forge
libprotobuf               3.16.0               h780b84a_0    conda-forge
librdkafka                1.6.1                hc49e61c_1    conda-forge
librmm                    21.10.01        cuda11.2_gc54767f_0    rapidsai
librttopo                 1.1.0                h1185371_6    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsolv                   0.7.17               h780b84a_0    conda-forge
libspatialindex           1.9.3                h9c3ff4c_4    conda-forge
libspatialite             5.0.1                h8694cbe_5    conda-forge
libssh2                   1.9.0                ha56f1ee_6    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libthrift                 0.14.2               he6d91bd_1    conda-forge
libtiff                   4.3.0                hf544144_1    conda-forge
libutf8proc               2.6.1                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.42.0               h7f98852_0    conda-forge
libwebp                   1.2.0                h3452ae3_0    conda-forge
libwebp-base              1.2.0                h7f98852_2    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxgboost                1.4.2dev.rapidsai21.10      cuda11.2_0    rapidsai
libxml2                   2.9.12               h72842e0_0    conda-forge
libzip                    1.8.0                h4de3113_0    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
llvmlite                  0.36.0           py37h9d7f4d0_0    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     0.8.0            py37h7f483ca_0    conda-forge
mapclassify               2.4.3              pyhd8ed1ab_0    conda-forge
markdown                  3.3.6              pyhd8ed1ab_0    conda-forge
markupsafe                2.0.1            py37h5e8e339_0    conda-forge
matplotlib-base           3.4.2            py37hdd32ed1_0    conda-forge
matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
mistune                   0.8.4           py37h5e8e339_1004    conda-forge
msgpack-python            1.0.2            py37h2527ec5_1    conda-forge
multidict                 5.1.0            py37h5e8e339_1    conda-forge
multipledispatch          0.6.0                      py_0    conda-forge
munch                     2.5.0                      py_0    conda-forge
nbclient                  0.5.9              pyhd8ed1ab_0    conda-forge
nbconvert                 6.3.0            py37h89c1867_1    conda-forge
nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
nccl                      2.11.4.1             hdc17891_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.5.4              pyhd8ed1ab_0    conda-forge
networkx                  2.6.3              pyhd8ed1ab_1    conda-forge
nodejs                    14.17.4              h92b4a50_0    conda-forge
notebook                  6.4.6              pyha770c72_0    conda-forge
numba                     0.53.1           py37hb11d6e1_1    conda-forge
numpy                     1.21.2           py37hd8d4704_0  
numpy-base                1.21.2           py37h2b8c604_0  
nvtx                      0.2.3            py37h5e8e339_0    conda-forge
oauthlib                  3.1.1              pyhd8ed1ab_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1k               h7f98852_0    conda-forge
orc                       1.6.9                h58a87f1_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.3.1            py37h219a48f_0    conda-forge
pandoc                    2.16.2               h7f98852_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
panel                     0.12.4             pyhd8ed1ab_0    conda-forge
param                     1.12.0             pyh6c4a22f_0    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickle5                   0.0.11           py37h5e8e339_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.3.1            py37h0f21c89_0    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pooch                     1.5.2              pyhd8ed1ab_0    conda-forge
poppler                   21.03.0              h93df280_0    conda-forge
poppler-data              0.4.11               hd8ed1ab_0    conda-forge
postgresql                13.3                 h2510834_0    conda-forge
proj                      8.0.1                h277dcde_0    conda-forge
prometheus_client         0.12.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.24             pyha770c72_0    conda-forge
protobuf                  3.16.0           py37hcd2ae1e_0    conda-forge
psutil                    5.8.0            py37h5e8e339_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py-xgboost                1.4.2dev.rapidsai21.10  cuda11.2py37_0    rapidsai
pyarrow                   5.0.0           py37hf0016df_1_cuda    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycosat                   0.6.3           py37h5e8e339_1006    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyct                      0.4.6                      py_0    conda-forge
pyct-core                 0.4.6                      py_0    conda-forge
pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
pyee                      8.1.0              pyh9f0ad1d_0    conda-forge
pygments                  2.10.0             pyhd8ed1ab_0    conda-forge
pyjwt                     2.3.0              pyhd8ed1ab_1    conda-forge
pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.6              pyhd8ed1ab_0    conda-forge
pyppeteer                 0.2.6              pyhd8ed1ab_0    conda-forge
pyproj                    3.1.0            py37h2f13a41_3    conda-forge
pyrsistent                0.17.3           py37h5e8e339_2    conda-forge
pysocks                   1.7.1            py37h89c1867_3    conda-forge
python                    3.7.10          hffdb5ce_100_cpython    conda-forge
python-confluent-kafka    1.6.0            py37h5e8e339_1    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.7                     2_cp37m    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
pyviz_comms               2.1.0              pyhd8ed1ab_0    conda-forge
pywavelets                1.1.1            py37h902c9e0_3    conda-forge
pyyaml                    5.4.1            py37h5e8e339_0    conda-forge
pyzmq                     22.1.0           py37h336d617_0    conda-forge
rapids                    21.10.00        cuda11.2_py37_ge66f011_114    rapidsai
rapids-xgboost            21.10.00        cuda11.2_py37_ge66f011_114    rapidsai
re2                       2021.06.01           h9c3ff4c_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
reproc                    14.2.1               h36c2ea0_0    conda-forge
reproc-cpp                14.2.1               h58526e2_0    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
requests-oauthlib         1.3.0              pyh9f0ad1d_0    conda-forge
rmm                       21.10.01        cuda_11.2_py37_gc54767f_0    rapidsai
rsa                       4.8                pyhd8ed1ab_0    conda-forge
rtree                     0.9.7            py37h0b55af0_3    conda-forge
ruamel_yaml               0.15.80         py37h5e8e339_1004    conda-forge
s2n                       1.0.10               h9b69904_0    conda-forge
scikit-image              0.18.1           py37hdc94413_0    conda-forge
scikit-learn              0.24.2           py37h18a542f_0    conda-forge
scipy                     1.7.1            py37hc65b3f8_2  
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                49.6.0           py37h89c1867_3    conda-forge
shapely                   1.7.1            py37h2d1e849_5    conda-forge
simpervisor               0.4                pyhd8ed1ab_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sniffio                   1.2.0            py37h89c1867_2    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.8.5                h4bd325d_0    conda-forge
sqlite                    3.36.0               h9cd32fc_0    conda-forge
streamz                   0.6.3              pyh6c4a22f_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
terminado                 0.12.1           py37h89c1867_1    conda-forge
testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.0.0              pyh8a188c0_0    conda-forge
tifffile                  2021.7.2           pyhd8ed1ab_0    conda-forge
tiledb                    2.3.2                he87e0bf_0    conda-forge
tk                        8.6.10               h21135ba_1    conda-forge
toolz                     0.11.2             pyhd8ed1ab_0    conda-forge
tornado                   6.1              py37h5e8e339_1    conda-forge
tqdm                      4.59.0             pyhd8ed1ab_0    conda-forge
traitlets                 5.1.1              pyhd8ed1ab_0    conda-forge
treelite                  2.1.0            py37h4b3d254_0    conda-forge
treelite-runtime          2.1.0                    pypi_0    pypi
typing-extensions         4.0.1                hd8ed1ab_0    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
tzcode                    2021a                h7f98852_2    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
ucx                       1.11.2+gef2bbcf      cuda11.2_0    rapidsai
ucx-proc                  1.0.0                       gpu    rapidsai
ucx-py                    0.22.01         py37_gef2bbcf_33    rapidsai
urllib3                   1.26.3             pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
websocket-client          1.2.3              pyhd8ed1ab_0    conda-forge
websockets                9.1              py37h5e8e339_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
widgetsnbextension        3.5.2            py37h89c1867_1    conda-forge
xarray                    0.20.2             pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.3                h9d8b166_2    conda-forge
xgboost                   1.4.2dev.rapidsai21.10  cuda11.2py37_0    rapidsai
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yarl                      1.6.3            py37h5e8e339_2    conda-forge
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zfp                       0.5.5                h9c3ff4c_5    conda-forge
zict                      2.0.0                      py_0    conda-forge
zipp                      3.6.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

jobs-git commented 1 year ago

I noticed this as well, @Hadi-94 did you find a solution?

beckernick commented 1 year ago

@jobs-git would you be able to share a minimal, reproducible example that illustrates this behavior? KNN Classifier uses exact nearest neighbors (which makes this unexpected).

It's not trivial to reproduce this behavior, as shown below (using the 23.04 nightly package).

from sklearn.neighbors import KNeighborsClassifier as sk_KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
import cuml

N = 10000
K = 100

X, y = make_classification(
    n_samples=N,
    n_features=K
)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=12, test_size=0.2)

ALGORITHMS = [
    "brute",
]

N_NEIGHBORS = [
    1,
    2,
    5,
    10,
    50
]

METRICS = [
    "euclidean",
    "manhattan",
    "cosine",
]

for alg in ALGORITHMS:
    for n_neighbors in N_NEIGHBORS:
        for metric in METRICS:
            params = {
                "algorithm": alg,
                "n_neighbors": n_neighbors,
                "metric": metric,
            }
            # cuml
            clf = cuml.neighbors.KNeighborsClassifier(**params)   
            clf.fit(X_train, y_train)
            y_pred = clf.predict(X_test)
            conf_mat_cuml = confusion_matrix(y_test, y_pred)

            # sklearn
            clf = sk_KNeighborsClassifier(**params)   
            clf.fit(X_train, y_train)
            y_pred = clf.predict(X_test)
            conf_mat_skl = confusion_matrix(y_test, y_pred)
            np.testing.assert_array_equal(conf_mat_skl, conf_mat_cuml)

print("All confusion matrices match.")
All confusion matrices match.

jobs-git commented 1 year ago

@beckernick apparently, the sklearn has weights="distance" which is what I have enabled for cpu-knn, so that was the reason why sklearn performed well. On same settings weight="uniform", I was almost getting parity, unfortunately, I could not test the weight="distance" in cuml as this is not implemented yet.

Feature request was already submitted so I am not creating a new issue on that, see: https://github.com/rapidsai/cuml/issues/4611

TLDR: It was the different weight setting.

beckernick commented 1 year ago

Thanks for confirming. I'm going to close this issue as resolved.

rapidsai / cuml

[BUG] cuML KNN Classifier gives lower results when comapred to sklearn KNN Classifier #4459