kNN Classifier Accuracy deviating from scikit-learn[BUG]

evanhowington commented 9 months ago

Describe the bug I was comparing the results of my work converted to use cuML over scikit-learn, with respect to the kNN Classification. For cuML when I run a test size of 10% my test accuracy crosses above my training accuracy around k=100 but the same code ran on normal scikit-learn the accuracy curves stay strictly separated with no crossover. Then, when i increase the test size to 20% i get the opposite result with my cuML accuracy curves staying strictly separated and my scikit-learn curves beginning their crossover around k=60. will include a screenshot in the attachments.

Steps/Code to reproduce bug I have provided both sets of code using cuml and scikit-learn

Expected behavior I would expect the accuracy to be relatively the same using cuml and scikit-learn, however I am producing deviations.

Environment details (please complete the following information):

Environment location: home pc
Linux Distro/Architecture: Pop!_OS 22.04 LTS x86_64

GPU Model/Driver: NVIDIA GeForce RTX 4090
CPU Model: Ryzen 9 7950x
CUDA: when i run nvcc -V inside the rapids environment I get: Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0 when I run nvidia-smi i get: NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3

aethyn@pop-os:~$ neofetch ///////////// aethyn@pop-os ///////////////////// ------------- ///////767//////////////// OS: Pop!_OS 22.04 LTS x86_64 //////7676767676////////////// Kernel: 6.6.10-76060610-generic /////76767//7676767////////////// Uptime: 5 hours, 9 mins /////767676///76767/////////////// Packages: 1982 (dpkg), 25 (flatpak) ///////767676///76767.///7676/////// Shell: bash 5.1.16 /////////767676//76767///767676//////// Resolution: 3840x2160, 3840x2160, 3840x2160 //////////76767676767////76767///////// DE: GNOME 42.5 ///////////76767676//////7676////////// WM: Mutter ////////////,7676,///////767/////////// WM Theme: Pop /////////////*7676///////76//////////// Theme: Pop-dark [GTK2/3] ///////////////7676//////////////////// Icons: Pop [GTK2/3] ///////////////7676///767//////////// Terminal: gnome-terminal //////////////////////'//////////// CPU: AMD Ryzen 9 7950X (32) @ 5.881GHz //////.7676767676767676767,////// GPU: AMD ATI 6c:00.0 Device 164e /////767676767676767676767///// GPU: NVIDIA 01:00.0 NVIDIA Corporation Device 2684 /////////////////////////// Memory: 18801MiB / 63423MiB /////////////////////

Method of cuDF & cuML install: miniconda3 (rapids-23.12) aethyn@pop-os:~/PycharmProjects/pythonProject$ conda list
packages in environment at /home/aethyn/miniconda3/envs/rapids-23.12:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge absl-py 2.1.0 pyhd8ed1ab_0 conda-forge aiohttp 3.9.1 py310h2372a71_0 conda-forge aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge alsa-lib 1.2.10 hd590300_0 conda-forge annotated-types 0.6.0 pyhd8ed1ab_0 conda-forge anyio 4.2.0 pyhd8ed1ab_0 conda-forge aom 3.8.1 h59595ed_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py310h2372a71_4 conda-forge arrow 1.3.0 pyhd8ed1ab_0 conda-forge asttokens 2.4.1 pyhd8ed1ab_0 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge async-lru 2.0.4 pyhd8ed1ab_0 conda-forge async-timeout 4.0.3 pyhd8ed1ab_0 conda-forge attr 2.5.1 h166bdaf_1 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge aws-c-auth 0.7.11 h0b4cabd_1 conda-forge aws-c-cal 0.6.9 h14ec70c_3 conda-forge aws-c-common 0.9.12 hd590300_0 conda-forge aws-c-compression 0.2.17 h572eabf_8 conda-forge aws-c-event-stream 0.4.1 h97bb272_2 conda-forge aws-c-http 0.8.0 h9129f04_2 conda-forge aws-c-io 0.14.0 hf8f278a_1 conda-forge aws-c-mqtt 0.10.1 h2b97f5f_0 conda-forge aws-c-s3 0.4.9 hca09fc5_0 conda-forge aws-c-sdkutils 0.1.13 h572eabf_1 conda-forge aws-checksums 0.1.17 h572eabf_7 conda-forge aws-crt-cpp 0.26.0 h04327c0_8 conda-forge aws-sdk-cpp 1.11.210 hba3e011_10 conda-forge azure-core-cpp 1.10.3 h91d86a7_1 conda-forge azure-storage-blobs-cpp 12.10.0 h00ab1b0_0 conda-forge azure-storage-common-cpp 12.5.0 hb858b4b_2 conda-forge babel 2.14.0 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.12.3 pyha770c72_0 conda-forge bleach 6.1.0 pyhd8ed1ab_0 conda-forge blinker 1.7.0 pyhd8ed1ab_0 conda-forge blosc 1.21.5 h0f2a231_0 conda-forge bokeh 3.3.4 pyhd8ed1ab_0 conda-forge branca 0.7.1 pyhd8ed1ab_0 conda-forge brotli 1.1.0 hd590300_1 conda-forge brotli-bin 1.1.0 hd590300_1 conda-forge brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge brunsli 0.1 h9c3ff4c_0 conda-forge bzip2 1.0.8 hd590300_5 conda-forge c-ares 1.26.0 hd590300_0 conda-forge c-blosc2 2.13.2 hb4ffafa_0 conda-forge ca-certificates 2024.2.2 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.3.2 pyhd8ed1ab_0 conda-forge cairo 1.18.0 h3faef2a_0 conda-forge certifi 2024.2.2 py310h06a4308_0
cffi 1.16.0 py310h2fee648_0 conda-forge cfitsio 4.3.1 hbdc6101_0 conda-forge charls 2.4.2 h59595ed_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_1 conda-forge cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge colorcet 3.0.1 pyhd8ed1ab_0 conda-forge comm 0.2.1 pyhd8ed1ab_0 conda-forge contourpy 1.2.0 py310hd41b1e2_0 conda-forge cryptography 42.0.2 py310hb8475ec_0 conda-forge cubinlinker 0.3.0 py310hfdf336d_0 rapidsai cucim 23.12.01 cuda11_py310_231211_ga3445df_0 rapidsai cuda-profiler-api 11.8.86 0 nvidia cuda-python 11.8.3 py310h70a93da_0 conda-forge cuda-version 11.5 h6c6c5af_2 conda-forge cudatoolkit 11.5.2 hbdc67f6_13 conda-forge cudf 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai cudf_kafka 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai cudnn 8.8.0.121 hcdd5f01_4 conda-forge cugraph 23.12.00 cuda11_py310_231206_g1309813f_0 rapidsai cuml 23.12.00 cuda11_py310_231206_gad2bd2b65_0 rapidsai cuproj 23.12.01 cuda11_py310_231207_g16727064_0 rapidsai cupy 13.0.0 py310h189a05f_3 conda-forge cupy-core 13.0.0 py310h506062a_3 conda-forge cuspatial 23.12.01 cuda11_py310_231207_g16727064_0 rapidsai custreamz 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai cuxfilter 23.12.00 cuda11_py310_231206_g63dabeb_0 rapidsai cycler 0.12.1 pyhd8ed1ab_0 conda-forge cyrus-sasl 2.1.27 h54b06d7_7 conda-forge cytoolz 0.12.3 py310h2372a71_0 conda-forge dash 2.15.0 pyhd8ed1ab_0 conda-forge dask 2023.11.0 pyhd8ed1ab_0 conda-forge dask-core 2023.11.0 pyhd8ed1ab_0 conda-forge dask-cuda 23.12.00 py310_231206_ge1638ae_0 rapidsai dask-cudf 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai dask-sql 2024.1.0 py310hac45122_0 conda-forge datashader 0.16.0 pyhd8ed1ab_0 conda-forge dav1d 1.2.1 hd590300_0 conda-forge dbus 1.13.18 hb2f20db_0
debugpy 1.8.1 py310hc6cd4ac_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2023.11.0 pyhd8ed1ab_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge executing 2.0.1 pyhd8ed1ab_0 conda-forge expat 2.5.0 hcb278e6_1 conda-forge fastapi 0.103.0 pyhd8ed1ab_0 conda-forge fastrlock 0.8.2 py310hc6cd4ac_2 conda-forge filelock 3.13.1 pyhd8ed1ab_0 conda-forge fiona 1.9.5 py310h0a1e91f_2 conda-forge flask 3.0.2 pyhd8ed1ab_0 conda-forge flatbuffers 23.5.26 h59595ed_1 conda-forge fmt 9.1.0 h924138e_0 conda-forge folium 0.15.1 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_1 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.48.1 py310h2372a71_0 conda-forge fqdn 1.5.1 pyhd8ed1ab_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge freexl 2.0.0 h743c826_0 conda-forge frozenlist 1.4.1 py310h2372a71_0 conda-forge fsspec 2024.2.0 pyhca7485f_0 conda-forge gast 0.5.4 pyhd8ed1ab_0 conda-forge gdal 3.8.1 py310haaa150b_3 conda-forge gdk-pixbuf 2.42.10 h829c605_4 conda-forge geopandas 0.14.3 pyhd8ed1ab_0 conda-forge geopandas-base 0.14.3 pyha770c72_0 conda-forge geos 3.12.1 h59595ed_0 conda-forge geotiff 1.7.1 hf074850_14 conda-forge gettext 0.21.1 h27087fc_0 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h0b41bf4_3 conda-forge glib 2.78.3 hfc55251_0 conda-forge glib-tools 2.78.3 hfc55251_0 conda-forge glog 0.6.0 h6f12383_0 conda-forge gmock 1.14.0 ha770c72_1 conda-forge gmp 6.3.0 h59595ed_0 conda-forge gmpy2 2.1.2 py310h3ec546c_1 conda-forge google-auth 2.27.0 pyhca7485f_0 conda-forge google-auth-oauthlib 1.2.0 pyhd8ed1ab_0 conda-forge google-pasta 0.2.0 pyh8c360ce_0 conda-forge graphistry 0.33.0 pyhd8ed1ab_0 conda-forge graphite2 1.3.14 h295c915_1
grpcio 1.59.3 py310h1b8f574_0 conda-forge gst-plugins-base 1.22.9 h8e1006c_0 conda-forge gstreamer 1.22.9 h98fc4e7_0 conda-forge gtest 1.14.0 h00ab1b0_1 conda-forge h11 0.14.0 pyhd8ed1ab_0 conda-forge h2 4.1.0 pyhd8ed1ab_0 conda-forge h5py 3.10.0 nompi_py310h65828d5_101 conda-forge harfbuzz 8.3.0 h3d44ed6_0 conda-forge hdf4 4.2.15 h2a13503_7 conda-forge hdf5 1.14.3 nompi_h4f84152_100 conda-forge holoviews 1.18.2 pyhd8ed1ab_0 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge httpcore 1.0.2 pyhd8ed1ab_0 conda-forge httpx 0.26.0 pyhd8ed1ab_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge icu 73.2 h59595ed_0 conda-forge idna 3.6 pyhd8ed1ab_0 conda-forge imagecodecs 2024.1.1 py310h496a806_0 conda-forge imageio 2.33.1 pyh8c1a49c_0 conda-forge importlib-metadata 7.0.1 pyha770c72_0 conda-forge importlib_metadata 7.0.1 hd8ed1ab_0 conda-forge importlib_resources 6.1.1 pyhd8ed1ab_0 conda-forge ipykernel 6.29.2 pyhd33586a_0 conda-forge ipython 8.21.0 pyh707e725_0 conda-forge ipywidgets 8.0.4 py310h06a4308_0
isoduration 20.11.0 pyhd8ed1ab_0 conda-forge itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.19.1 pyhd8ed1ab_0 conda-forge jinja2 3.1.3 pyhd8ed1ab_0 conda-forge joblib 1.3.2 pyhd8ed1ab_0 conda-forge json-c 0.17 h7ab15ed_0 conda-forge json5 0.9.14 pyhd8ed1ab_0 conda-forge jsonpointer 2.4 py310hff52083_3 conda-forge jsonschema 4.21.1 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2023.12.1 pyhd8ed1ab_0 conda-forge jsonschema-with-format-nongpl 4.21.1 pyhd8ed1ab_0 conda-forge jupyter 1.0.0 py310h06a4308_8
jupyter-lsp 2.2.2 pyhd8ed1ab_0 conda-forge jupyter-server-proxy 4.1.0 pyhd8ed1ab_0 conda-forge jupyter_client 8.6.0 pyhd8ed1ab_0 conda-forge jupyter_console 6.6.3 py310h06a4308_0
jupyter_core 5.7.1 py310hff52083_0 conda-forge jupyter_events 0.9.0 pyhd8ed1ab_0 conda-forge jupyter_server 2.12.5 pyhd8ed1ab_0 conda-forge jupyter_server_terminals 0.5.2 pyhd8ed1ab_0 conda-forge jupyterlab 4.1.0 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.3.0 pyhd8ed1ab_1 conda-forge jupyterlab_server 2.25.2 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 3.0.9 py310h06a4308_0
jxrlib 1.1 hd590300_3 conda-forge kealib 1.5.3 h2f55d51_0 conda-forge keras 2.15.0 pyhd8ed1ab_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.5 py310hd41b1e2_1 conda-forge krb5 1.21.2 h659d440_0 conda-forge lame 3.100 h7b6447c_0
lazy_loader 0.3 pyhd8ed1ab_0 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20230802.1 cxx17_h59595ed_0 conda-forge libaec 1.1.2 h59595ed_1 conda-forge libarchive 3.7.2 h2aa1ff5_1 conda-forge libarrow 14.0.2 h84dd17c_3_cpu conda-forge libarrow-acero 14.0.2 h59595ed_3_cpu conda-forge libarrow-dataset 14.0.2 h59595ed_3_cpu conda-forge libarrow-flight 14.0.2 h120cb0d_3_cpu conda-forge libarrow-flight-sql 14.0.2 h61ff412_3_cpu conda-forge libarrow-gandiva 14.0.2 hacb8726_3_cpu conda-forge libarrow-substrait 14.0.2 h61ff412_3_cpu conda-forge libavif16 1.0.4 h1dcd450_0 conda-forge libblas 3.9.0 21_linux64_openblas conda-forge libboost-headers 1.84.0 ha770c72_0 conda-forge libbrotlicommon 1.1.0 hd590300_1 conda-forge libbrotlidec 1.1.0 hd590300_1 conda-forge libbrotlienc 1.1.0 hd590300_1 conda-forge libcap 2.69 h0f662aa_0 conda-forge libcblas 3.9.0 21_linux64_openblas conda-forge libclang 15.0.7 default_hb11cfb5_4 conda-forge libclang13 15.0.7 default_ha2b6cf4_4 conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcublas 11.11.3.6 0 nvidia libcublas-dev 11.11.3.6 0 nvidia libcucim 23.12.01 cuda11_231211_ga3445df_0 rapidsai libcudf 23.12.01 cuda11_231208_g2ce46216b5_0 rapidsai libcudf_kafka 23.12.01 cuda11_231208_g2ce46216b5_0 rapidsai libcufft 10.9.0.58 0 nvidia libcufile 1.4.0.31 0 nvidia libcufile-dev 1.4.0.31 0 nvidia libcugraph 23.12.00 cuda11_231206_g1309813f_0 rapidsai libcugraph_etl 23.12.00 cuda11_231206_g1309813f_0 rapidsai libcugraphops 23.12.00 cuda11_231206_g42d08202_0 nvidia libcuml 23.12.00 cuda11_231206_gad2bd2b65_0 rapidsai libcumlprims 23.12.00 cuda11_231206_gc120fe0_0 nvidia libcups 2.3.3 h4637d8d_4 conda-forge libcurand 10.3.0.86 0 nvidia libcurand-dev 10.3.0.86 0 nvidia libcurl 8.5.0 hca28451_0 conda-forge libcusolver 11.4.1.48 0 nvidia libcusolver-dev 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libcusparse-dev 11.7.5.86 0 nvidia libcuspatial 23.12.01 cuda11_231207_g16727064_0 rapidsai libdeflate 1.19 hd590300_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.5.0 hcb278e6_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libflac 1.4.3 h59595ed_0 conda-forge libgcc-ng 13.2.0 h807b86a_5 conda-forge libgcrypt 1.10.3 hd590300_0 conda-forge libgdal 3.8.1 h4b8bffa_3 conda-forge libgfortran-ng 13.2.0 h69a702a_5 conda-forge libgfortran5 13.2.0 ha4646dd_5 conda-forge libglib 2.78.3 h783c2da_0 conda-forge libgoogle-cloud 2.12.0 h5206363_4 conda-forge libgpg-error 1.47 h71f35ed_0 conda-forge libgrpc 1.59.3 hd6c4280_0 conda-forge libhwloc 2.9.3 default_h554bfaf_1009 conda-forge libiconv 1.17 hd590300_2 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge libkml 1.3.0 h01aab08_1018 conda-forge libkvikio 23.12.00 cuda11_231206_gf90bfbe_0 rapidsai liblapack 3.9.0 21_linux64_openblas conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge libllvm15 15.0.7 hb3ce162_4 conda-forge libmagma 2.7.2 h09159a4_2 conda-forge libmagma_sparse 2.7.2 h09159a4_2 conda-forge libnetcdf 4.9.2 nompi_h9612171_113 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnl 3.9.0 hd590300_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libntlm 1.4 h7f98852_1002 conda-forge libnuma 2.0.16 h0b41bf4_1 conda-forge libogg 1.3.5 h27cfd23_1
libopenblas 0.3.26 pthreads_h413a1c8_0 conda-forge libopus 1.3.1 h7b6447c_0
libparquet 14.0.2 h352af49_3_cpu conda-forge libpng 1.6.42 h2797004_0 conda-forge libpq 16.2 h33b98f1_0 conda-forge libprotobuf 4.24.4 hf27288f_0 conda-forge libraft 23.12.00 cuda11_231206_g9e2d6277_0 rapidsai libraft-headers 23.12.00 cuda11_231206_g9e2d6277_0 rapidsai libraft-headers-only 23.12.00 cuda11_231206_g9e2d6277_0 rapidsai librdkafka 1.9.2 ha5a0de0_2 conda-forge libre2-11 2023.06.02 h7a70373_0 conda-forge librmm 23.12.00 cuda11_231206_g2db5cbb3_0 rapidsai librttopo 1.1.0 h8917695_15 conda-forge libsndfile 1.2.2 hc60ed4a_1 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libspatialindex 1.9.3 h9c3ff4c_4 conda-forge libspatialite 5.1.0 h72606ae_3 conda-forge libsqlite 3.45.1 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge libsystemd0 255 h3516f8a_0 conda-forge libthrift 0.19.0 hb90f79a_1 conda-forge libtiff 4.6.0 ha9c0a0a_2 conda-forge libtorch 2.1.2 cuda112_hce05544_300 conda-forge libutf8proc 2.8.0 h166bdaf_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libuv 1.46.0 hd590300_0 conda-forge libvorbis 1.3.7 h7b6447c_0
libwebp 1.3.2 h658648e_1 conda-forge libwebp-base 1.3.2 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxgboost 1.7.6 rapidsai_he275d05_7 rapidsai libxkbcommon 1.6.0 hd429924_1 conda-forge libxml2 2.12.5 h232c23b_0 conda-forge libzip 1.10.1 h2629f0a_3 conda-forge libzlib 1.2.13 hd590300_5 conda-forge libzopfli 1.0.3 h9c3ff4c_0 conda-forge linkify-it-py 2.0.3 pyhd8ed1ab_0 conda-forge llvm-openmp 17.0.6 h4dfa4b3_0 conda-forge llvmlite 0.40.1 py310h1b8f574_0 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge lz4 4.3.3 py310h350c4a5_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 h516909a_1000 conda-forge magma 2.7.2 h2cf16e7_2 conda-forge mapclassify 2.6.1 pyhd8ed1ab_0 conda-forge markdown 3.5.2 pyhd8ed1ab_0 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.5 py310h2372a71_0 conda-forge matplotlib-base 3.8.2 py310h62c0568_0 conda-forge matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mdit-py-plugins 0.4.0 pyhd8ed1ab_0 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge minizip 4.0.4 h0ab5242_0 conda-forge mistune 3.0.2 pyhd8ed1ab_0 conda-forge mkl 2023.2.0 h84fe81f_50496 conda-forge ml_dtypes 0.2.0 py310hcc13569_2 conda-forge mpc 1.3.1 hfe3b2da_0 conda-forge mpfr 4.2.1 h9458935_0 conda-forge mpg123 1.32.4 h59595ed_0 conda-forge mpmath 1.3.0 pyhd8ed1ab_0 conda-forge msgpack-python 1.0.7 py310hd41b1e2_0 conda-forge multidict 6.0.5 py310h2372a71_0 conda-forge multipledispatch 0.6.0 py_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mysql-common 8.0.33 hf1915f5_6 conda-forge mysql-libs 8.0.33 hca2cd23_6 conda-forge nbclient 0.8.0 pyhd8ed1ab_0 conda-forge nbconvert 7.16.0 pyhd8ed1ab_0 conda-forge nbconvert-core 7.16.0 pyhd8ed1ab_0 conda-forge nbconvert-pandoc 7.16.0 pyhd8ed1ab_0 conda-forge nbformat 5.9.2 pyhd8ed1ab_0 conda-forge nccl 2.19.4.1 h0800d71_0 conda-forge ncurses 6.4 h59595ed_2 conda-forge nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge networkx 3.2.1 pyhd8ed1ab_0 conda-forge nodejs 20.9.0 hb753e55_0 conda-forge noise 1.2.2 py310h2372a71_1005 conda-forge notebook 7.0.6 py310h06a4308_0
notebook-shim 0.2.3 pyhd8ed1ab_0 conda-forge nspr 4.35 h27087fc_0 conda-forge nss 3.97 h1d7d5a4_0 conda-forge numba 0.57.1 py310h0f6aa51_0 conda-forge numpy 1.23.4 py310h53a5b5f_1 conda-forge nvcomp 3.0.4 h838ba91_1 conda-forge nvtx 0.2.8 py310h2372a71_1 conda-forge oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge openjpeg 2.5.0 h488ebb8_3 conda-forge openslide 3.4.1 h58ba908_12 conda-forge openssl 3.2.1 hd590300_0 conda-forge opt_einsum 3.3.0 pyhc1e730c_2 conda-forge orc 1.9.2 h4b38347_0 conda-forge overrides 7.7.0 pyhd8ed1ab_0 conda-forge packaging 23.2 pyhd8ed1ab_0 conda-forge palettable 3.3.3 pyhd8ed1ab_0 conda-forge pandas 1.5.3 py310h9b08913_1 conda-forge pandoc 3.1.11.1 ha770c72_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge panel 1.3.8 pyhd8ed1ab_0 conda-forge param 2.0.2 pyhca7485f_0 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.4.1 pyhd8ed1ab_0 conda-forge pcre2 10.42 hcad00b1_0 conda-forge pexpect 4.9.0 pyhd8ed1ab_0 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 10.2.0 py310h01dd4db_0 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.2 h59595ed_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 4.2.0 pyhd8ed1ab_0 conda-forge plotly 5.18.0 pyhd8ed1ab_0 conda-forge ply 3.11 py310h06a4308_0
poppler 23.12.0 h590f24d_0 conda-forge poppler-data 0.4.12 hd8ed1ab_0 conda-forge postgresql 16.2 h7387d8b_0 conda-forge proj 9.3.0 h1d62c97_2 conda-forge prometheus_client 0.19.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.42 pyha770c72_0 conda-forge prompt_toolkit 3.0.42 hd8ed1ab_0 conda-forge protobuf 4.24.4 py310h620c231_0 conda-forge psutil 5.9.8 py310h2372a71_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptxcompiler 0.8.1 py310h70a93da_2 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pulseaudio-client 16.1 hb77b528_5 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge py-xgboost 1.7.6 rapidsai_py310h4c2db5f_7 rapidsai pyarrow 14.0.2 py310hf9e7431_3_cpu conda-forge pyarrow-hotfix 0.6 pyhd8ed1ab_0 conda-forge pyasn1 0.5.1 pyhd8ed1ab_0 conda-forge pyasn1-modules 0.3.0 pyhd8ed1ab_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pyct 0.5.0 py310h06a4308_0
pyct-core 0.5.0 pyhd8ed1ab_0 conda-forge pydantic 2.6.1 pyhd8ed1ab_0 conda-forge pydantic-core 2.16.2 py310hcb5633a_1 conda-forge pyee 8.1.0 pyhd8ed1ab_0 conda-forge pygments 2.17.2 pyhd8ed1ab_0 conda-forge pyjwt 2.8.0 pyhd8ed1ab_1 conda-forge pylibcugraph 23.12.00 cuda11_py310_231206_g1309813f_0 rapidsai pylibraft 23.12.00 cuda11_py310_231206_g9e2d6277_0 rapidsai pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyopenssl 24.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.1.1 pyhd8ed1ab_0 conda-forge pyppeteer 1.0.2 pyhd8ed1ab_0 conda-forge pyproj 3.6.1 py310h32c33b7_4 conda-forge pyqt 5.15.10 py310h6a678d5_0
pyqt5-sip 12.13.0 py310h5eee18b_0
pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.10.13 hd12c33a_1_cpython conda-forge python-confluent-kafka 1.9.2 py310h5764c6d_2 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-fastjsonschema 2.19.1 pyhd8ed1ab_0 conda-forge python-flatbuffers 23.5.26 pyhd8ed1ab_0 conda-forge python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge python_abi 3.10 4_cp310 conda-forge pytorch 2.1.2 cuda112_py310hce1e03f_300 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge pyviz_comms 3.0.0 pyhd8ed1ab_0 conda-forge pywavelets 1.4.1 py310h1f7b6fc_1 conda-forge pyyaml 6.0.1 py310h2372a71_1 conda-forge pyzmq 25.1.2 py310h795f18f_0 conda-forge qt-main 5.15.8 h450f30e_18 conda-forge qtconsole 5.5.0 py310h06a4308_0
qtpy 2.4.1 py310h06a4308_0
raft-dask 23.12.00 cuda11_py310_231206_g9e2d6277_0 rapidsai rapids 23.12.00 cuda11_py310_231206_g1d8bed4_0 rapidsai rapids-dask-dependency 23.12.01 0 rapidsai rapids-xgboost 23.12.00 cuda11_py310_231206_g1d8bed4_0 rapidsai rav1e 0.6.6 he8a937b_2 conda-forge rdma-core 50.0 hd3aeb46_0 conda-forge re2 2023.06.02 h2873b5e_0 conda-forge readline 8.2 h8228510_1 conda-forge referencing 0.33.0 pyhd8ed1ab_0 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge retrying 1.3.3 py_2 conda-forge rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge rich 13.7.0 pyhd8ed1ab_0 conda-forge rmm 23.12.00 cuda11_py310_231206_g2db5cbb3_0 rapidsai rpds-py 0.17.1 py310hcb5633a_0 conda-forge rsa 4.9 pyhd8ed1ab_0 conda-forge rtree 1.2.0 py310hbdcdc62_0 conda-forge s2n 1.4.1 h06160fa_0 conda-forge scikit-image 0.21.0 py310hc6cd4ac_0 conda-forge scikit-learn 1.4.0 py310h1fdf081_0 conda-forge scipy 1.12.0 py310hb13e2d6_2 conda-forge seaborn 0.12.2 py310h06a4308_0
send2trash 1.8.2 pyh41d4057_0 conda-forge setuptools 69.0.3 pyhd8ed1ab_0 conda-forge shapely 2.0.2 py310hc3e127f_1 conda-forge simpervisor 1.0.0 pyhd8ed1ab_0 conda-forge sip 6.7.12 py310h6a678d5_0
six 1.16.0 pyh6c4a22f_0 conda-forge sleef 3.5.1 h9b69904_2 conda-forge snappy 1.1.10 h9fff704_0 conda-forge sniffio 1.3.0 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge spdlog 1.11.0 h9b3ece8_1 conda-forge sqlite 3.45.1 h2c6b66d_0 conda-forge squarify 0.4.3 py_0 conda-forge stack_data 0.6.2 pyhd8ed1ab_0 conda-forge starlette 0.27.0 pyhd8ed1ab_0 conda-forge streamz 0.6.4 pyh6c4a22f_0 conda-forge svt-av1 1.8.0 h59595ed_0 conda-forge sympy 1.12 pypyh9d50eac_103 conda-forge tabulate 0.9.0 pyhd8ed1ab_1 conda-forge tbb 2021.11.0 h00ab1b0_1 conda-forge tblib 3.0.0 pyhd8ed1ab_0 conda-forge tenacity 8.2.3 pyhd8ed1ab_0 conda-forge tensorboard 2.15.2 pyhd8ed1ab_0 conda-forge tensorboard-data-server 0.7.0 py310h75e40e8_1 conda-forge tensorflow 2.15.0 cpu_py310h7825f03_2 conda-forge tensorflow-base 2.15.0 cpu_py310h7e4d085_2 conda-forge tensorflow-estimator 2.15.0 cpu_py310haacee6a_2 conda-forge termcolor 2.4.0 pyhd8ed1ab_0 conda-forge terminado 0.18.0 pyh0d859eb_0 conda-forge threadpoolctl 3.2.0 pyha21a80b_0 conda-forge tifffile 2024.1.30 pyhd8ed1ab_0 conda-forge tiledb 2.18.4 h4386cac_0 conda-forge tinycss2 1.2.1 pyhd8ed1ab_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge toolz 0.12.1 pyhd8ed1ab_0 conda-forge tornado 6.3.3 py310h2372a71_1 conda-forge tqdm 4.66.2 pyhd8ed1ab_0 conda-forge traitlets 5.14.1 pyhd8ed1ab_0 conda-forge treelite 3.9.1 py310h4a6579d_0 conda-forge treelite-runtime 3.9.1 pypi_0 pypi types-python-dateutil 2.8.19.20240106 pyhd8ed1ab_0 conda-forge typing-extensions 4.9.0 hd8ed1ab_0 conda-forge typing_extensions 4.9.0 pyha770c72_0 conda-forge typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge tzcode 2024a h3f72095_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge tzlocal 5.2 py310hff52083_0 conda-forge uc-micro-py 1.0.3 pyhd8ed1ab_0 conda-forge ucx 1.15.0 h75e419f_3 conda-forge ucx-proc 1.0.0 gpu rapidsai ucx-py 0.35.00 py310_231206_gb5f60ca_0 rapidsai unicodedata2 15.1.0 py310h2372a71_0 conda-forge uri-template 1.3.0 pyhd8ed1ab_0 conda-forge uriparser 0.9.7 hcb278e6_1 conda-forge urllib3 1.26.18 pyhd8ed1ab_0 conda-forge uvicorn 0.27.1 py310hff52083_0 conda-forge wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge webcolors 1.13 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_2 conda-forge websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge websockets 10.4 py310h5764c6d_1 conda-forge werkzeug 3.0.1 pyhd8ed1ab_0 conda-forge wheel 0.42.0 pyhd8ed1ab_0 conda-forge widgetsnbextension 4.0.5 py310h06a4308_0
wrapt 1.14.1 py310h5764c6d_1 conda-forge xarray 2024.1.1 pyhd8ed1ab_0 conda-forge xarray-spatial 0.3.7 pyhd8ed1ab_0 conda-forge xcb-util 0.4.0 hd590300_1 conda-forge xcb-util-image 0.4.0 h8ee46fc_1 conda-forge xcb-util-keysyms 0.4.0 h8ee46fc_1 conda-forge xcb-util-renderutil 0.3.9 hd590300_1 conda-forge xcb-util-wm 0.4.1 h8ee46fc_1 conda-forge xerces-c 3.2.5 hac6953d_0 conda-forge xgboost 1.7.6 rapidsai_py310h4c2db5f_7 rapidsai xkeyboard-config 2.41 hd590300_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.7 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xf86vidmodeproto 2.3.1 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xyzservices 2023.10.1 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.9.4 py310h2372a71_0 conda-forge zeromq 4.3.5 h59595ed_0 conda-forge zfp 1.0.1 h59595ed_0 conda-forge zict 3.0.0 pyhd8ed1ab_0 conda-forge zipp 3.17.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 hd590300_5 conda-forge zlib-ng 2.0.7 h0b41bf4_0 conda-forge zstd 1.5.5 hfc55251_0 conda-forge (rapids-23.12) aethyn@pop-os:~/PycharmProjects/pythonProject$

Additional context duplicate_this.zip

bdice commented 9 months ago

@evanhowington Thanks for the issue! You mentioned on Slack that the zip file with your data wasn't uploaded. Can you try that again? There is a 25 MB file size limit for zip files, so you may need to split up the data (you mentioned the size was a few megabytes).

evanhowington commented 9 months ago

@bdice I updated the original post to include the zip file at the bottom of it under "Additional Context".

evanhowington commented 9 months ago

I did some digging and it appears scikit-learn uses a numpy random state instance while cuML uses a cupy random state instance by default with an option of using a numpy random state instance. https://scikit-learn.org/stable/glossary.html#term-random_state https://docs.rapids.ai/api/cuml/stable/api/#preprocessing-metrics-and-utilities

I have not had a chance to test the numpy random state instance on cuML yet. I'm still trying to figure out to invoke the optional numpy random state instance in cuML. Is it just calling numpy.random.RandomState in the cuML as follows: random_state = numpy.random.RandomState ?

If it is the random_state causing the discrepancy perhaps something like train_test_split(X, y, test_size=0.1, random_state=42, random_state_environment={"cupy", "numpy"})where one specifies where to pull the random state from. Also, maybe the default could be numpy so that the results would match up with someone running the same code on scikit-learn, with the option to be to choose cupy. I only suggest that because if the desire is for them to produce equivalent results out of the box with cuML offering a speedup, we recognize that scikit-learn cant always call a cupy random state on all devices so the cuML default could be a numpy random state for the sake of reproducible results.

dantegd commented 8 months ago

Thanks for the issue @evanhowington, I had written a response and closed my tab before submitting :(.

The issue very likely is not coming from using the random state either from numpy or cupy. Haven't yet tested it myself, but given the difference in the parallel/CUDA code it might just be an inherent difference.

rapidsai / cuml

kNN Classifier Accuracy deviating from scikit-learn[BUG] #5773

packages in environment at /home/aethyn/miniconda3/envs/rapids-23.12:

Name Version Build Channel