intel / scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
https://intel.github.io/scikit-learn-intelex/
Apache License 2.0
1.22k stars 175 forks source link

Offloading to GPU is not working for DBSCAN #1368

Closed psmgeelen closed 1 year ago

psmgeelen commented 1 year ago

Describe the bug Following the example in the documentation about GPU offloading, I noticed that it did run, but that there was CPU load and that it didnt seem to be using the GPU (didnt hear any fans ramp up or anything). The example is:

from sklearnex import patch_sklearn, config_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
   clustering = DBSCAN(eps=3, min_samples=2).fit(X)

I have also tried to more explicitly offload by using the general context

from sklearnex import patch_sklearn, set_config
patch_sklearn()

set_config(offload="gpu:0")

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

But that didn't change much. I also considered the opportunity to offload the object:

import dpctl
from sklearnex import patch_sklearn, set_config
patch_sklearn()

# Offloading
q = dpctl.SyclQueue("gpu")
print(q.sycl_device.is_gpu)
print(q.sycl_device.is_cpu)
print(q.print_device_info())
set_config(offload="gpu:0")

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Which prints out nicely:

True
False
    Name            Intel(R) Arc(TM) A770 Graphics
    Driver version  1.3.26241
    Vendor          Intel(R) Corporation
    Filter string   level_zero:gpu:0

The best argument I could find that GPU offloading is not working as it should is because:

  1. There is clear CPU load and increased thermals (so there is activity)
  2. Changing the offloading from GPU to CPU doesn't change execution time for whatever reason. I used timeit to compare 10 runs each time, and they are the same regardless whether I offload to CPU or GPU.

To Reproduce Already provided above

Expected behavior Execution time should change when you run it on different hardware

Environment: Ubuntu 23.04 CPU: 16-core AMD Ryzen 9 5950X (-MT MCP-) speed/min/max: 2258/2200/5083 MHz Kernel: 6.2.0-24-generic x86_64 Up: 1h 27m Mem: 7957.1/128724.3 MiB (6.2%) Storage: 931.51 GiB (37.3% used) Procs: 576 Shell: Bash inxi: 3.3.25

psmgeelen commented 1 year ago

Maybe a relevant side-question could be: how can I validate that I am actually offloading to the GPU?

Alexsandruss commented 1 year ago

You can use verbose mode to see which device was used: https://intel.github.io/scikit-learn-intelex/verbose.html

psmgeelen commented 1 year ago

@Alexsandruss, awesome! So I can confirm that it is running on CPU, regardless whether I set_config(target_offload = "gpu:0") or not. The logging returns:

SKLEARNEX INFO: sklearn.utils.validation._assert_all_finite: running accelerated version on CPU

I am thinking aloud here, could this be a precision issue of the data itself? That I am using a precision that is not compatible with GPU and that it therefore falls back on the CPU?

napetrov commented 1 year ago

@psmgeelen you can try other algorithms meanwhile. DBSCAN have some specifics that put it apart

samir-nasibli commented 1 year ago

@psmgeelen could you please list your conda env as well? It would be very useful for reproducing

psmgeelen commented 1 year ago

@napetrov, thanks for responding. So I am trying to do some benchmarking with Intel GPU and followed the compatability list in the documentation here https://intel.github.io/scikit-learn-intelex/algorithms.html; So the algorithms that I have tried running are:

@samir-nasibli , I listed my environment using conda list and it returned:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                hb9a14ef_9    intel
ca-certificates           2023.01.10           h06a4308_0    intel
certifi                   2022.12.7        py39h06a4308_0    intel
daal4py                   2023.1.1        py39_intel_48679    intel
dal                       2023.1.1            intel_48679    intel
dpcpp-cpp-rt              2023.1.0            intel_46305    intel
dpcpp_cpp_rt              2023.1.0            intel_46305    intel
dpctl                     0.14.2           py39ha23a21d_9    intel
fortran_rt                2023.1.0            intel_46305    intel
glob2                     0.7                        py_0    conda-forge
icc_rt                    2023.1.0            intel_46305    intel
impi_rt                   2021.9.0            intel_43482    intel
intel-cmplr-lib-rt        2023.1.0            intel_46305    intel
intel-cmplr-lic-rt        2023.1.0            intel_46305    intel
intel-fortran-rt          2023.1.0            intel_46305    intel
intel-opencl-rt           2023.1.0            intel_46305    intel
intel-openmp              2023.1.0            intel_46305    intel
intelpython               2023.1.0                      1    intel
joblib                    1.2.0              pyh3f38642_0    intel
libffi                    3.3                          14    intel
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
mkl                       2023.1.0            intel_46342    intel
mkl-service               2.4.0           py39h75d02e3_15    intel
mkl_fft                   1.3.5           py39h28f0b46_11    intel
mkl_random                1.2.2           py39h0b06908_51    intel
mkl_umath                 0.1.1           py39h450dca2_61    intel
ncurses                   6.4                  h6a678d5_0    intel
numpy                     1.23.5           py39h52df89b_7    intel
numpy-base                1.23.5           py39ha03f565_7    intel
openssl                   1.1.1t               h7f8727e_0    intel
pandas                    1.5.3            py39h6cd0baa_0    intel
pip                       23.0.1           py39h06a4308_0    intel
python                    3.9.16               h2722d68_1    intel
python-dateutil           2.8.2                    py39_2    intel
pytz                      2022.7           py39h06a4308_0    intel
readline                  8.2                  h5eee18b_0    intel
scikit-learn              1.2.1            py39h6a678d5_0    intel
scikit-learn-intelex      2023.1.1        py39_intel_48679    intel
scipy                     1.7.3            py39h4ca98da_8    intel
setuptools                65.6.3           py39h06a4308_0    intel
six                       1.16.0             pyhd3eb1b0_1    intel
sqlite                    3.41.1               h5eee18b_0    intel
tbb                       2021.9.0            intel_43484    intel
tbb4py                    2021.9.0        py39_intel_43484    intel
threadpoolctl             2.2.0              pyh0d69192_0    intel
tk                        8.6.12               h1ccaba5_0    intel
tqdm                      4.64.0           py39h06a4308_0    intel
wheel                     0.38.4           py39h06a4308_0    intel
xz                        5.2.8                h5eee18b_0    intel
zlib                      1.2.13               h5eee18b_0    intel
samir-nasibli commented 1 year ago

Thank you @psmgeelen ! Could you please also share what system platforms dpctl returns python-m dpctl -f and ls -al $OCL_ICD_VENDORS? DBSCAN have some implementation specifics, it requires OpenCL loader on env installed. I see that you already have intel-opencl-rt, but there is an issue https://github.com/IntelPython/dpctl/issues/1006 and OCL_ICD_VENDORS should pointed for gpu

Also please update your conda env via: conda update -c intel -c conda-forge --all daal4py, scikit-learn-intelex 2023.2 already are available

psmgeelen commented 1 year ago

Hi @samir-nasibli, python -m dpctl -f returned:

Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 LINUX
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                AMD Ryzen 9 5950X 16-Core Processor            
        Version             2023.16.6.0.22_223734
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Version     OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) FPGA Emulation Device
        Version             2023.16.6.0.22_223734
        Filter string       opencl:accelerator:0
Platform  2 ::
    Name        Intel(R) OpenCL Graphics
    Version     OpenCL 3.0 
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Arc(TM) A770 Graphics
        Version             23.17.26241.33
        Filter string       opencl:gpu:0
Platform  3 ::
    Name        Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Version     OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) FPGA Emulation Device
        Version             2023.16.6.0.22_223734
        Filter string       opencl:accelerator:1
Platform  4 ::
    Name        Intel(R) Level-Zero
    Version     1.3
    Vendor      Intel(R) Corporation
    Backend     ext_oneapi_level_zero
    Num Devices 1
      # 0
        Name                Intel(R) Arc(TM) A770 Graphics
        Version             1.3.26241
        Filter string       level_zero:gpu:0

I think I might be doing something wrong when running ls -al $OCL_ICD_VENDORS as it just lists the current directory.. I have updated the conda environmen running conda update -c intel -c conda-forge --all and the environment is now:

# Name                    Version                   Build  Channel                                                                                     
_libgcc_mutex             0.1                 conda_forge    intel                                                                                     
_openmp_mutex             4.5                       2_gnu    intel                                                                                     
bzip2                     1.0.8                hb9a14ef_9    intel                                                                                     
ca-certificates           2023.5.7             hbcca054_0    intel                                                                                     
certifi                   2022.12.7        py39h06a4308_0    intel                                                                                     
colorama                  0.4.6              pyhd8ed1ab_0    intel                                                                                     
daal4py                   2023.2.0        py39_intel_49572    intel                                                                                    
dal                       2023.2.0            intel_49572    intel                                                                                     
dpcpp-cpp-rt              2023.2.0            intel_49495    intel                                                                                     
dpcpp_cpp_rt              2023.2.0            intel_49495    intel                                                                                     
dpctl                     0.14.5          py39he78b74f_24    intel                                                                                     
fortran_rt                2023.1.0            intel_46305    intel                                                                                     
glob2                     0.7                        py_0    conda-forge                                                                               
icc_rt                    2023.2.0            intel_49495    intel                                                                                     
impi_rt                   2021.9.0            intel_43482    intel                                                                                     
intel-cmplr-lib-rt        2023.2.0            intel_49495    intel                                                                                     
intel-cmplr-lic-rt        2023.2.0            intel_49495    intel                                                                                     
intel-fortran-rt          2023.2.0            intel_49495    intel                                                                                     
intel-opencl-rt           2023.2.0            intel_49495    intel
intel-openmp              2023.2.0            intel_49495    intel
intelpython               2023.2.0                      0    intel
joblib                    1.2.0              pyh3f38642_0    intel
level-zero                1.11.0               h00ab1b0_0    intel
libffi                    3.4.2                h7f98852_5    intel
libgcc-ng                 12.2.0              h65d4601_19    intel
libgomp                   12.2.0              h65d4601_19    intel
libnsl                    2.0.0                h7f98852_0    intel
libsqlite                 3.42.0               h2797004_0    intel
libstdcxx-ng              12.2.0              h46fd767_19    intel
libuuid                   2.38.1               h0b41bf4_0    intel
libzlib                   1.2.13               hd590300_5    intel
mkl                       2023.2.0            intel_49495    intel
mkl-service               2.4.0           py39h75d02e3_15    intel
mkl_fft                   1.3.6           py39h173b8ae_56    intel
mkl_random                1.2.2           py39h1595b48_76    intel
mkl_umath                 0.1.1           py39hd987cd3_86    intel
ncurses                   6.4                  hcb278e6_0    intel
numpy                     1.24.3           py39hed7eef7_0    intel
numpy-base                1.24.3           py39he88ecf9_0    intel
openssl                   3.1.1                hd590300_1    intel
pandas                    1.5.3            py39h6cd0baa_0    intel
pip                       23.1.2             pyhd8ed1ab_0    intel
python                    3.9.16              hef7c979_23    intel
python-dateutil           2.8.2                    py39_2    intel
pytz                      2022.7           py39h06a4308_0    intel
readline                  8.2                  h8228510_1    intel
scikit-learn              1.2.1            py39h6a678d5_0    intel
scikit-learn-intelex      2023.2.0        py39_intel_49572    intel
scipy                     1.7.3            py39h4ca98da_8    intel
setuptools                67.7.2             pyhd8ed1ab_0    intel
six                       1.16.0             pyhd3eb1b0_1    intel
sqlite                    3.41.1               h5eee18b_0    intel
tbb                       2021.9.0            intel_43484    intel
tbb4py                    2021.9.0        py39_intel_43484    intel
threadpoolctl             2.2.0              pyh0d69192_0    intel
tk                        8.6.12               h1ccaba5_0    intel
tqdm                      4.65.0             pyhd8ed1ab_1    intel
tzdata                    2023c                h71feb2d_0    intel
wheel                     0.40.0             pyhd8ed1ab_0    intel
xz                        5.2.8                h5eee18b_0    intel
zlib                      1.2.13               hd590300_5    intel

I think the updating broke the environment, as I now get this error:

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I resolved this running: conda install -c intel -c conda-forge numpy=1.22 and now have an environment like this:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    intel
_openmp_mutex             4.5                       2_gnu    intel
bzip2                     1.0.8                hb9a14ef_9    intel
ca-certificates           2023.5.7             hbcca054_0    intel
certifi                   2023.5.7           pyhd8ed1ab_0    intel
colorama                  0.4.6              pyhd8ed1ab_0    intel
daal4py                   2023.2.0        py39_intel_49572    intel
dal                       2023.2.0            intel_49572    intel
dpcpp-cpp-rt              2023.2.0            intel_49495    intel
dpcpp_cpp_rt              2023.2.0            intel_49495    intel
dpctl                     0.14.5          py39he78b74f_24    intel
fortran_rt                2023.1.0            intel_46305    intel
glob2                     0.7                        py_0    conda-forge
icc_rt                    2023.2.0            intel_49495    intel
impi_rt                   2021.9.0            intel_43482    intel
intel-cmplr-lib-rt        2023.2.0            intel_49495    intel
intel-cmplr-lic-rt        2023.2.0            intel_49495    intel
intel-fortran-rt          2023.2.0            intel_49495    intel
intel-opencl-rt           2023.2.0            intel_49495    intel
intel-openmp              2023.2.0            intel_49495    intel
intelpython               2023.2.0                      0    intel
joblib                    1.2.0              pyh3f38642_0    intel
level-zero                1.11.0               h00ab1b0_0    intel
libffi                    3.4.2                h7f98852_5    intel
libgcc-ng                 12.2.0              h65d4601_19    intel
libgomp                   12.2.0              h65d4601_19    intel
libnsl                    2.0.0                h7f98852_0    intel
libsqlite                 3.42.0               h2797004_0    intel
libstdcxx-ng              12.2.0              h46fd767_19    intel
libuuid                   2.38.1               h0b41bf4_0    intel
libzlib                   1.2.13               hd590300_5    intel
mkl                       2023.2.0            intel_49495    intel
mkl-service               2.4.0           py39h75d02e3_15    intel
mkl_fft                   1.3.1           py39hcab1719_22    intel
mkl_random                1.2.2           py39hbf47bc3_22    intel
mkl_umath                 0.1.1           py39hf66a691_32    intel
ncurses                   6.4                  hcb278e6_0    intel
numpy                     1.22.3           py39hf0956d0_5    intel
numpy-base                1.22.3           py39h45c9ace_5    intel
openssl                   3.1.1                hd590300_1    intel
pandas                    1.5.3            py39h6cd0baa_0    intel
pip                       23.1.2             pyhd8ed1ab_0    intel
python                    3.9.16              hef7c979_23    intel
python-dateutil           2.8.2                    py39_2    intel
pytz                      2022.7           py39h06a4308_0    intel
readline                  8.2                  h8228510_1    intel
scikit-learn              1.2.1            py39h6a678d5_0    intel
scikit-learn-intelex      2023.2.0        py39_intel_49572    intel
scipy                     1.7.3            py39h4ca98da_8    intel
setuptools                67.7.2             pyhd8ed1ab_0    intel
six                       1.16.0             pyhd3eb1b0_1    intel
sqlite                    3.41.1               h5eee18b_0    intel
tbb                       2021.9.0            intel_43484    intel
tbb4py                    2021.9.0        py39_intel_43484    intel
threadpoolctl             2.2.0              pyh0d69192_0    intel
tk                        8.6.12               h1ccaba5_0    intel
tqdm                      4.65.0             pyhd8ed1ab_1    intel
tzdata                    2023c                h71feb2d_0    intel
wheel                     0.40.0             pyhd8ed1ab_0    intel
xz                        5.2.8                h5eee18b_0    intel
zlib                      1.2.13               hd590300_5    intel

When I run my script I still get: INFO:sklearnex: sklearn.utils.validation._assert_all_finite: running accelerated version on CPU even though the I offloaded to the GPU.

psmgeelen commented 1 year ago

Is there anything else I can do to support the process?

samir-nasibli commented 1 year ago

Hi @psmgeelen! Unfortunately I didn't reproduce your issue. I am getting GPU offloading. Let me investigate it more. I will let you know.

psmgeelen commented 1 year ago

@samir-nasibli , I might be doing something stupid. I ran this:

from sklearnex import patch_sklearn, config_context
import numpy as np
import logging
logger = logging.getLogger('sklearnex')
logger.setLevel(logging.INFO)
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
   clustering = DBSCAN(eps=3, min_samples=2).fit(X)

In my new environment and got this print out:

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
INFO:sklearnex: sklearn.cluster.DBSCAN.fit: running accelerated version on GPU

So it seems to be working. For some reason, this doesn't work on my larger benchmark test yet. Please give me 24 hours to debug myself before closing this issue. I'll get back to you soon!

psmgeelen commented 1 year ago

Maybe small inbetween question: what does the log: INFO:sklearnex: sklearn.utils.validation._assert_all_finite: running accelerated version on CPU exactly mean?

psmgeelen commented 1 year ago

After reinstalling the environment again, the issue is not reproducible anymore. I guess the error was transient. I noticed that the GPU support for Intel is not accurately described in the documentation. I found that:

are supported on an ARC 770, while the so-called supported algorithms for GPUs in the documentation:

Furthermore I had some issues with the methods that are associated to the models. For example the fit_predict method for DBSCAN threw an error:

RuntimeError: Cannot use target offload option inside daal4py.oneapi.sycl_context

While using the fit method works just fine.

Regardless, closing the issue. Thanks for the support!