Document exact GPU compilation procedure using CUDA_PATH

shap / shap

A game theoretic approach to explain the output of any machine learning model.

https://shap.readthedocs.io

MIT License

21.96k stars 3.2k forks source link

Document exact GPU compilation procedure using CUDA_PATH #1650

Open mirekphd opened 3 years ago

mirekphd commented 3 years ago

When invoking: shap_values_train = explainer.shap_values(X=train_x, y=train_y) I'm getting this error raised here https://github.com/slundberg/shap/blob/6af9e1008702fb0fab939bf2154bbf93dfe84a16/shap/explainers/_gpu_tree.py#L8 as a result of missing CUDA extension _cext_gpu.lib, which should have been compiled by compile_cuda_module if I had the CUDA_PATH env var defined at pip-installation time (as part of docker build in my case).

Can we have a more specific error message (telling user exactly what to do, complete with an example path for a given location of nvcc)?

I did notice this remark in the docs on shap.GPUTreeExplainer, : _"Currently requires source build with cuda available and ‘CUDAPATH’ environment variable defined." but it was not clear enough about the role of CUDA_PATH (just having it created obviously was not enough).

A more specific error message would point me to the exact solution even faster, without looking into your source code:)

Regarding the "source build' part there could be also some clarification on how to proceed, e.g. ideally in a form of exact compilation instructions or a Dockerfile like this one: dockerfile.gpu.

Having an exact compilation example is rather important, because the path to nvcc looks like this: /usr/local/cuda-10.1/bin/nvcc so there are at least 4 ways to define the path:

export CUDA_PATH=/usr/local/cuda-10.1/bin
export CUDA_PATH=/usr/local/cuda-10.1/bin/
export CUDA_PATH=/usr/local/cuda-10.1
export CUDA_PATH=/usr/local/cuda-10.1/

And here only the last two would work given how the variable is used: https://github.com/slundberg/shap/blob/5831d55b598bf17b63fd4c81f9799c4d771d0bc7/setup.py#L63

mirekphd commented 3 years ago

Here's an argument that the current compilation instructions are barely scratching the surface: currently (i.e. for the latest 2 commits to master) CUDA compilation failed in my tests in all CUDA docker containers I tried (the official nvidia/cuda as well as custom ones), with CUDA versions ranging from 10.1 to 11.1, and gcc builds from 8 to 10 (appropriately matched).

Two issues I consistently encountered that invariably led nvcc to fail ("WARNING: Could not compile cuda extensions") were:

the use of --extended-lambda here: https://github.com/slundberg/shap/blob/5831d55b598bf17b63fd4c81f9799c4d771d0bc7/setup.py#L69 which seems to be unavailable for older CUDA versions (like 10.1), but replacement with its alias --expt-extended-lambda seems to work,
some C++ dialect problems: error: namespace "std" has no member "size" in the following 6 lines:
```
shap/cext/_cext_gpu.cu(178): error: namespace "std" has no member "size"
```

shap/cext/_cext_gpu.cu(179): error: no instance of function template "flat_idx_to_tensor_idx" matches the argument list argument types are: (size_t, size_t [3], )

shap/cext/_cext_gpu.cu(183): error: no instance of function template "tensor_idx_to_flat_idx" matches the argument list argument types are: (size_t [3], )

shap/cext/_cext_gpu.cu(225): error: namespace "std" has no member "size"

shap/cext/_cext_gpu.cu(226): error: no instance of function template "flat_idx_to_tensor_idx" matches the argument list argument types are: (size_t, size_t [4], )

shap/cext/_cext_gpu.cu(231): error: no instance of function template "tensor_idx_to_flat_idx" matches the argument list argument types are: (size_t [4], )

6 errors detected in the compilation of "shap/cext/_cext_gpu.cu". Error building cuda module: CalledProcessError(1, ['/usr/local/cuda/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/usr/include/python3.8', '--extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_60', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']) WARNING: Could not compile cuda extensions


Just for fun, I tried to specify `"--compiler-options -std=c++17` but it only worsened the problem.

CalebEverett commented 3 years ago

Hi, were you able to install with gpu support? I was interested in trying that out as well, but got the error below as well. I was installing from the repo and setting the CUDA_PATH environment variable to /usr/local/cuda/bin/nvcc

record_import_error("cext_gpu", "cuda extension was not built during install!", e)

mirekphd commented 3 years ago

Hi, were you able to install with gpu support? I was interested in trying that out as well, but got the error below as well. I was installing from the repo and setting the CUDA_PATH environment variable to /usr/local/cuda/bin/nvcc

No, failed due to the above compilation errors, which I hope can be fixed by developers (gently ping @RAMitchell ).

In your case, have you tried export CUDA_PATH=/usr/local/cuda (without /bin/nvcc)? The last subfolder and compiler part is already hard-coded here (which hopefully will be documented): https://github.com/slundberg/shap/blob/5831d55b598bf17b63fd4c81f9799c4d771d0bc7/setup.py#L63

RAMitchell commented 3 years ago

Sorry we are working through a few issues, can you try pr #1646?

mirekphd commented 3 years ago

So after PR #1646 was merged in (thank you @RAMitchell and @JohnZed ), I tested GPU compilation from source on Ubuntu using these two setups. The errors reported above were resolved only for the latest CUDA, but for legacy version compilation failed for an unknown reason. Tried it also in custom containers, but for reproducibility used the official ones.

More info

a) latest: CUDA 11.1 (official nvidia/cuda container with Ubuntu 20.4, python 3.8.5 and gcc 9.3.0).:

nvcc compilation: success (long wait without any progress bar, because nvcc uses single core, which I suppose cannot be helped). The import of shap in python3.8 succeeded, but raised this warning:

"is not" with a literal. Did you mean "!="? shap.GPUTreeExplainer was available and should work (not tested it further, only CUDA 10.1 could be tested functionally).

b) legacy: CUDA 10.1 ( official nvidia/cuda container with Ubuntu 18.4, python 3.8.0, and gcc upgraded to 8.4.0 - the maximum version supported by CUDA 10.1, which I installed as /usr/local/cuda/bin/gcc):

nvcc compilation: FAILURE (with a non-specific error "ar: build/lib_cext_gpu.a: No such file or directory"):

NVCC ==>  /usr/local/cuda/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/usr/local/cuda/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/usr/include/python3.8', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_60', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
ar: build/lib_cext_gpu.a: No such file or directory
Error building cuda module: CalledProcessError(1, ['/usr/local/cuda/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/usr/include/python3.8', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_60', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions

mirekphd commented 3 years ago

Code to reproduce CUDA 10.1 failure:

$ docker run --rm -it --name cuda101 -u 0 nvidia/cuda:10.1-devel-ubuntu18.04 bash

# install maximum supported gcc compiler 
MAX_GCC_VERSION=8 # for CUDA 10.1,10.2
apt update && apt install -y gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION

# add symlinks to max supported gcc inside a CUDA subfolder
ln -s /usr/bin/gcc-$MAX_GCC_VERSION /usr/local/cuda/bin/gcc 
ln -s /usr/bin/g++-$MAX_GCC_VERSION /usr/local/cuda/bin/g++

# specify location of nvcc for shap compilation
export CUDA_PATH=/usr/local/cuda

# install python 3.8 and setuptools
apt install -y git python3.8 python3.8-dev python3-setuptools

# compile shap with GPU support
cd /tmp && \
git clone https://github.com/slundberg/shap && \
cd shap && \
git checkout 5209472 && \
python3.8 setup.py install

RAMitchell commented 3 years ago

The problem is actually that the build directory doesn't exist yet, so running it again results in successful compilation. I will submit a fix.

mirekphd commented 3 years ago

So the error message was specific after all!:) I had the nvidia/cuda 11.1 container ready, preserved from previous tests, so this is why it seemed to work right away (it's not due to new and improved CUDA version:)

I can now confirm the above-defined success in our custom-build container with CUDA 10.1 (mirekphd/ml-gpu-py38-cuda101-cust:latest). nvcc compilation stage now passes even for CUDA 10.1 and so does python install stage. When I import shap, it does succeed, but prints out these warnings that were also printed out during python install (with specific lines of code raising them):

>>> import shap
"is not" with a literal. Did you mean "!="?
"is" with a literal. Did you mean "=="?
"is" with a literal. Did you mean "=="?
"is not" with a literal. Did you mean "!="?

RAMitchell commented 3 years ago

I've created some scripts for automatically building/testing using docker: https://github.com/RAMitchell/shap-ci-scripts

I'm currently testing cuda 10.1 and cuda 11.1 with ubuntu 18.04, and everything is working well with the #1665 fix. There should be no need to set paths, setup.py should find nvcc.

Hopefully we can get this running regularly on a CI machine.

mirekphd commented 3 years ago

Thank you for the Dockerfile! You could even parametrize the Ubuntu version year here (to test also 20.04 and possibly 16.04)

I did not realize the list of shap's requirements is so long (here and here) in case of source build... until I tried to complete it in a bare bones container such as nvidia/cuda:)

And @RAMitchell is right that the python install stage would fail without these python dependencies... I wonder @slundberg if perhaps requiring the presence of so many extra python packages is necessary for shap installation (their installation may fail e.g. when there are multiple python versions as seen in my sample code above based on nvidia/cuda with older Ubuntu 18.04),

Perhaps the dependencies can be specified explicitly:

pip install -r requirements.txt

(I do hope they could be pip-installed for those not using conda)?

mirekphd commented 3 years ago

I've created some scripts for automatically building/testing using docker: https://github.com/RAMitchell/shap-ci-scripts

I'd change one thing in the Dockerfile though: I'd install python packages as a standard user, not root (by adding USER 1000 somewhere here and returning to USER root before shap compilation from source... then back to USER 1000 at the very end of file). The reason is that Installing popular python packages as root makes changes to their reverse dependencies practically impossible in the future for the container user (pip would sometimes require uninstalling dependency A to accommodate an older package B that requires A but pins it at a version older than was already installed for package C [here C is shap and A can be numpy or scipy]). And you can't uninstall anything as a standard user if it was installed as root (as part of shap's sumptuous dependencies package that needs to get pre-installed or else it gets installed automatically :)

mirekphd commented 3 years ago

In fact I managed to integrate GPU-enabled shap (using the duplicate python setup.py install workaround:) into one of our data science containers only after having moved its installation to the very end of the Dockerfile. Otherwise (if it preceded any other python package installation) it would break the installation of at least one other package due to the above root requirements problem (even though I preinstalled all shap's python dependencies as s standard user, shap still apparently tried to down- or upgrade some of them at build time (when having root)).

But now it's part of one of our CICD pipelines and ready for testing:

docker run --rm -d --name ml-gpu-py38-cuda101-cust -p 8888:8888 --gpus all mirekphd/ml-gpu-py38-cuda101-cust && docker logs -f ml-gpu-py38-cuda101-cust

mirekphd commented 3 years ago

Did our container work for you?:) In my tests the successful CUDA compilation made little to no difference... I'm testing it on a corporate server with a proven GPU access (e.g. XGBoost trains 10x faster with tree_method="gpu_hist"). Maybe the workaround of duplicate python setup.py install did not work as intended..?

When running: shap_values_valid = explainer.shap_values(X=valid_x, y=valid_y)

...I'm getting this error as previously:

cuda extension was not built during install!

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-22-cbd558096e13> in <module>
      1 if OBJECTIVE == "binary":
----> 2     shap_values_valid = explainer.shap_values(X=valid_x, y=valid_y)
      3 
      4 else:
      5     shap_values_valid = explainer.shap_values(X=valid_x)

/opt/conda/lib/python3.8/site-packages/shap/explainers/_gpu_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
    113 
    114         # run the core algorithm using the C extension
--> 115         assert_import("cext_gpu")
    116         phi = np.zeros((X.shape[0], X.shape[1] + 1, self.model.num_outputs))
    117         _cext_gpu.dense_tree_shap(

/opt/conda/lib/python3.8/site-packages/shap/utils/_general.py in assert_import(package_name)
     21         msg,e = import_errors[package_name]
     22         print(msg)
---> 23         raise e
     24 
     25 def record_import_error(package_name, msg, e):

    [... skipping hidden 1 frame]

<ipython-input-21-cbd558096e13> in <module>
      1 if OBJECTIVE == "binary":
----> 2     shap_values_valid = explainer.shap_values(X=valid_x, y=valid_y)
      3 
      4 else:
      5     shap_values_valid = explainer.shap_values(X=valid_x)

/opt/conda/lib/python3.8/site-packages/shap/explainers/_gpu_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
    113 
    114         # run the core algorithm using the C extension
--> 115         assert_import("cext_gpu")
    116         phi = np.zeros((X.shape[0], X.shape[1] + 1, self.model.num_outputs))
    117         _cext_gpu.dense_tree_shap(

/opt/conda/lib/python3.8/site-packages/shap/utils/_general.py in assert_import(package_name)
     21         msg,e = import_errors[package_name]
     22         print(msg)
---> 23         raise e
     24 
     25 def record_import_error(package_name, msg, e):

/opt/conda/lib/python3.8/site-packages/shap/explainers/_gpu_tree.py in <module>
      4 from ..utils import assert_import, record_import_error
      5 try:
----> 6     from .. import _cext_gpu
      7 except ImportError as e:
      8     record_import_error("cext_gpu", "cuda extension was not built during install!", e)

ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/opt/conda/lib/python3.8/site-packages/shap/__init__.py)

mirekphd commented 3 years ago

The lib certainly does exist and .py file is readable and .so - executable, so probably they are not where they should be:)

$ cd / && find | grep cext_gpu
[..]
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/__pycache__/_cext_gpu.cpython-38.pyc
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py

$ cat /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py
def __bootstrap__():
    global __bootstrap__, __loader__, __file__
    import sys, pkg_resources, importlib.util
    __file__ = pkg_resources.resource_filename(__name__, '_cext_gpu.cpython-38-x86_64-linux-gnu.so')
    __loader__ = None; del __bootstrap__, __loader__
    spec = importlib.util.spec_from_file_location(__name__,__file__)
    mod = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(mod)
__bootstrap__()

$ ls -lan /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
-rwxr-xr-x. 1 0 100 9756120 Dec 17 20:53 /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so

mirekphd commented 3 years ago

Does it perhaps need access to _cext_gpu.o? But it is not in python's site-packages, but where it was built from sources...:

$ cd / && find | grep _cext_gpu
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/__pycache__/_cext_gpu.cpython-38.pyc
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py
[..]
./usr/local/src/shap/build/lib.linux-x86_64-3.8/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
./usr/local/src/shap/build/lib_cext_gpu.a
./usr/local/src/shap/build/temp.linux-x86_64-3.8/shap/cext/_cext_gpu.o
./usr/local/src/shap/shap/cext/_cext_gpu.cc
./usr/local/src/shap/shap/cext/_cext_gpu.cu

RAMitchell commented 3 years ago

It should only need _cext_gpu.cpython-38-x86_64-linux-gnu.so as far as I know.

mirekphd commented 3 years ago

Ok, then may be this is part is relevant: ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/opt/conda/lib/python3.8/site-packages/shap/__init__.py)

So I also checked dependencies and they seem OK (not conflicting with requirements):

shap==0.37.0
  - numba [required: Any, installed: 0.52.0]
    - llvmlite [required: >=0.35.0,<0.36, installed: 0.35.0]
    - numpy [required: >=1.15, installed: 1.19.4]
    - setuptools [required: Any, installed: 51.0.0]
  - numpy [required: Any, installed: 1.19.4]
  - pandas [required: Any, installed: 1.1.5]
    - numpy [required: >=1.15.4, installed: 1.19.4]
    - python-dateutil [required: >=2.7.3, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.15.0]
    - pytz [required: >=2017.2, installed: 2020.4]
  - scikit-learn [required: Any, installed: 0.23.2]
    - joblib [required: >=0.11, installed: 1.0.0]
    - numpy [required: >=1.13.3, installed: 1.19.4]
    - scipy [required: >=0.19.1, installed: 1.5.4]
      - numpy [required: >=1.14.5, installed: 1.19.4]
    - threadpoolctl [required: >=2.0.0, installed: 2.1.0]
  - scipy [required: Any, installed: 1.5.4]
    - numpy [required: >=1.14.5, installed: 1.19.4]
  - slicer [required: ==0.0.3, installed: 0.0.3]
  - tqdm [required: >4.25.0, installed: 4.54.1]

mirekphd commented 3 years ago

The package folder listing with permissions:

$ ls -lan /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/
total 10104
drwxr-sr-x. 11 0 100    4096 Dec 17 20:53 .
drwxr-sr-x.  4 0 100      34 Dec 17 20:53 ..
drwxr-sr-x.  3 0 100      83 Dec 17 20:53 actions
drwxr-sr-x.  3 0 100     177 Dec 17 20:53 benchmark
drwxr-sr-x.  2 0 100      25 Dec 17 20:53 cext
-rwxr-xr-x.  1 0 100  516176 Dec 17 20:53 _cext.cpython-38-x86_64-linux-gnu.so
-rwxr-xr-x.  1 0 100 9756120 Dec 17 20:53 _cext_gpu.cpython-38-x86_64-linux-gnu.so
-rw-r--r--.  1 0 100     434 Dec 17 20:53 _cext_gpu.py
-rw-r--r--.  1 0 100     430 Dec 17 20:53 _cext.py
-rw-r--r--.  1 0 100    8798 Dec 17 20:53 datasets.py
drwxr-sr-x.  5 0 100    4096 Dec 17 20:53 explainers
-rw-r--r--.  1 0 100   25320 Dec 17 20:53 _explanation.py
-rw-r--r--.  1 0 100    2809 Dec 17 20:53 __init__.py
-rw-r--r--.  1 0 100     442 Dec 17 20:53 links.py
drwxr-sr-x.  3 0 100     179 Dec 17 20:53 maskers
drwxr-sr-x.  3 0 100     122 Dec 17 20:53 models
drwxr-sr-x.  5 0 100    4096 Dec 17 20:53 plots
drwxr-sr-x.  2 0 100     191 Dec 17 20:53 __pycache__
drwxr-sr-x.  3 0 100     192 Dec 17 20:53 utils

$ ls -lan /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/cext/
total 64
drwxr-sr-x.  2 0 100    25 Dec 17 20:53 .
drwxr-sr-x. 11 0 100  4096 Dec 17 20:53 ..
-rw-r--r--.  1 0 100 58130 Dec 17 20:53 tree_shap.h

mirekphd commented 3 years ago

I think I will have to create a local copy of the package (from this root-owned folder), import it and set up a breakpoint to examine the error in more detail (but that's tomorrow:).

mirekphd commented 3 years ago

I implemented a workaround for yet another likely reason for the "cuda extension was not built during install!" error, but it was not helpful in this case.

More info In our moderately complex python environment we have several meta-packages that require shap and these had to be installed first (due to the implicit root requirement of packages compiled from source like shap that I already mentioned earlier). As a result we may end up with two versions of shap: a CPU-only one installed by pip as a dependency of those meta-packages and the GPU-enabled I complied here (using procedure described above). And in spite of the installation sequence, the CPU version (installed first) takes priority over GPU one (installed second and in fact - last) when you execute import shap. This situation can be detected by listing all shap dependenies like this:

$ pipdeptree -r -p shap
[..]
shap==0.37.0
  - alibi==0.5.5 [requires: shap>=0.36]
  - BorutaShap==1.0.14 [requires: shap>=0.34.0]
  - causalml==0.8.0 [requires: shap]
  - explainerdashboard==0.2.16.1 [requires: shap>=0.36]

So a quick workaround was to uninstall all of the above meta-packages - reverse dependencies of shap.

mirekphd commented 3 years ago

Another possible reason for failure in case of secured setups is shap's strong assumption about user ID (root requirement is a special case). For instance if we installed a python package with user ID 1000 (as usual in our Dockerfiles), and the package would later (at run time) try to write down some temp download data (the download attempt itself would fail even faster on air-gapped servers by the way) to a package installation subfolder (rather than to a predictably writable location such as /tmp), the execution attempt would end before it really started - like in the 2nd cell of this shap lightgbm example notebook. We don't need no adult.data here, right?:)

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-2-cde0eb234dfd> in <module>
----> 1 X,y = shap.datasets.adult()
      2 X_display,y_display = shap.datasets.adult(display=True)
      3 
      4 # create a train/test split
      5 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)

/opt/conda/lib/python3.8/site-packages/shap/datasets.py in adult(display)
    110     ]
    111     raw_data = pd.read_csv(
--> 112         cache(github_data_url + "adult.data"),
    113         names=[d[0] for d in dtypes],
    114         na_values="?",

/opt/conda/lib/python3.8/site-packages/shap/datasets.py in cache(url, file_name)
    247     data_dir = os.path.join(os.path.dirname(__file__), "cached_data")
    248     if not os.path.isdir(data_dir):
--> 249         os.mkdir(data_dir)
    250 
    251     file_path = os.path.join(data_dir, file_name)

PermissionError: [Errno 13] Permission denied: '/opt/conda/lib/python3.8/site-packages/shap/cached_data'

To make the download in shap's datasets.py work we would need (apart from moving to a sandbox with access to the whole internet) run our data science container with maching user ID (the same UID as the installation-time UID), which would never happen in case of secure on-prem platforms like Openshift for example.

The installation time UID of 1000:

$ ls -lan /opt/conda/lib/python3.8/site-packages/shap/
total 660
drwxr-sr-x 11 1000 100   4096 Dec 18 13:47 .
drwsrwsr-x  1 1000 100   4096 Dec 18 13:49 ..
drwxr-sr-x  3 1000 100   4096 Dec 18 13:47 actions
drwxr-sr-x  3 1000 100   4096 Dec 18 13:47 benchmark
drwxr-sr-x  2 1000 100   4096 Dec 18 13:47 cext
-rwxr-xr-x  1 1000 100 578232 Dec 18 13:47 _cext.cpython-38-x86_64-linux-gnu.so
-rw-r--r--  1 1000 100   8798 Dec 18 13:47 datasets.py
drwxr-sr-x  5 1000 100   4096 Dec 18 13:47 explainers
-rw-r--r--  1 1000 100  25320 Dec 18 13:47 _explanation.py
-rw-r--r--  1 1000 100   2809 Dec 18 13:47 __init__.py
-rw-r--r--  1 1000 100    442 Dec 18 13:47 links.py
drwxr-sr-x  3 1000 100   4096 Dec 18 13:47 maskers
drwxr-sr-x  3 1000 100   4096 Dec 18 13:47 models
drwxr-sr-x  5 1000 100   4096 Dec 18 13:47 plots
drwxr-sr-x  2 1000 100   4096 Dec 18 13:47 __pycache__
drwxr-sr-x  3 1000 100   4096 Dec 18 13:47 utils

... vs. run-time UID (simulating random UID from a wide range that would be alloted to a running pod under Openshift):

$ id
uid=1000070000(jovyan) gid=0(root) groups=0(root)

mirekphd commented 3 years ago

You may be pleased to know @RAMitchell that I have found the reason behind the error (ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/opt/conda/lib/python3.8/site-packages/shap/__init__.py).

What happens is that not all files are extracted / copied to shap package main folder from .egg. Notably these two files were missing (and when copied manually immediately repaired the broken import):

cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py /opt/conda/lib/python3.8/site-packages/shap/

cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so /opt/conda/lib/python3.8/site-packages/shap/

Could we prevent this from happening?

mirekphd commented 3 years ago

So after several workarounds described above (which probably merit some attention?) I made GPU-enabled shap work on small datasets in the sandbox environment.

The performance gains are substantial - nearly 15 times even on small datasets with 10k rows - see shap-lgbm-gpu-test.zip based on SHAP's extensive examples library. Thank you @RAMitchell for this much needed work! Such performance boost is no mean feat at all, as GBDT algos themselves would require 10M not just 10k rows datasets (e.g. the Higgs boson dataset) to register any performance improvements in the GPU.

Memory management and accuracy issues (which are present in so many other GPU implementations, notably TF anv XGB, respectively) will be reported in separate issues - let's find time to solve the remaining 20% on the Pareto curve as well:)

Here's a simple reproducible performance comparison test that can be replicated by anyone with docker, GPU and CUDA 10.1+ compatible display driver. The timings reported here are for a low-spec dev machine with GeForce GTX 1080Ti and CUDA 11.0 driver (vs. single core of Intel i5-4690K):

docker run -it --rm --name test-shap-gpu --gpus=all -v $(pwd):/home/jovyan mirekphd/ml-gpu-py38-cuda101-cust:latest python shap-lgbm-gpu-test.py
[..]
SHAP values in CPU were estimated in 29 sec.

SHAP values in GPU were estimated in 2 sec.
[..]

Note: the example above assumes that the test script shap-lgbm-gpu-test.zip has been downloaded to the current folder and unzipped there. For enabling GPU access to docker containers see the Prerequisites section in my answer here.

RAMitchell commented 3 years ago

Awesome work. I'm on vacation for a week and will address the last installation issues when I'm back.

mirekphd commented 3 years ago

So here are my results of more realistic performance comparison tests. In a nutshell, moving to the GPU shortened the process of estimating SHAP values from minutes to seconds and from hours to minutes!

'''
SHAP values estimation times (for a LightGBM classifier 
with ca. 300 features) as a function of data rows (10k, 100k, 200k, 400k, 800k);
note that LightGBM v3.1.1 and above has CUDA implementaion of training 
and cv functions, hence `device` was set as either "cpu" 
or "gpu" (the latter denoted as "GPU(LGBM)"), but using GPU in LightGBM 
had a slightly negative impact on SHAP values calculation in the GPU:

10k:
CPU(SHAP)+CPU(LGBM): executed in 1m 14.2s
GPU(SHAP)+CPU(LGBM): executed in 1.86s

100k:
CPU(SHAP)+CPU(LGBM): executed in ca. 1 hour (verified)
GPU(SHAP)+CPU(LGBM): executed in 52.8s

200k: 
CPU(SHAP)+CPU(LGBM): executed in ca. 2 hours (verified)
GPU(SHAP)+CPU(LGBM): executed in 2m 1s

400k: 
CPU(SHAP)+CPU(LGBM): ca. 4 hours (estimate)
GPU(SHAP)+CPU(LGBM): executed in 6m 33s
GPU(SHAP)+GPU(LGBM): executed in 7m 22s

800k: 
CPU(SHAP)+CPU(LGBM): ca. 8 hours (estimate)
GPU(SHAP)+CPU(LGBM): executed in 19m 17s
GPU(SHAP)+GPU(LGBM): executed in 22m 43s
'''

Drfengze commented 3 years ago

Great job! Can this be used to calculate the interaction Shap values?

RAMitchell commented 3 years ago

Great job! Can this be used to calculate the interaction Shap values?

Yes, with the tree path dependent algorithm.

MichaelChaoLi-cpu commented 1 year ago

Hi @mirekphd ,

I read all this log. I meet the same issue:

cuda extension was not built during install!
Traceback (most recent call last):
  File "/home/mike-ubuntu/test.py", line 13, in <module>
    shap_values = explainer(X)
  File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/explainers/_tree.py", line 217, in __call__
    v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
  File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/explainers/_gpu_tree.py", line 105, in shap_values
    assert_import("cext_gpu")
  File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/utils/_general.py", line 25, in assert_import
    raise e
  File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/explainers/_gpu_tree.py", line 6, in <module>
    from .. import _cext_gpu
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/__init__.py)

The I found this comment:

cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py /opt/conda/lib/python3.8/site-packages/shap/

cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so /opt/conda/lib/python3.8/site-packages/shap/

I go to my shap to check:

(rapids-22.10) mike-ubuntu@DESKTOP-14KME2Q:~/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap$ ls
__init__.py  __pycache__  _cext.cpython-39-x86_64-linux-gnu.so  _explanation.py  _serializable.py  actions  benchmark  cached_data  cext  datasets.py  explainers  links.py  maskers  models  plots  utils

There is _cext.cpython-39-x86_64-linux-gnu.so, but never did I got _cext_gpu.py, as you mentioned.

How to get it?

Thank you.

MichaelChaoLi-cpu commented 1 year ago

Hi, @RAMitchell and @mirekphd

I read most of this log. I confirmed gcc version matched my nvcc version:

(ForShap) mike@mike-desktop:~/shap$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(ForShap) mike@mike-desktop:~/shap$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

(ForShap) mike@mike-desktop:~/shap$ echo $CUDA_PATH
/usr

However, I still get some error:

(ForShap) mike@mike-desktop:~/shap$ python setup.py install --user
NVCC ==>  /usr/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/usr/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/home/mike/anaconda3/envs/ForShap/include/python3.9', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
Error building cuda module: CalledProcessError(1, ['/usr/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/home/mike/anaconda3/envs/ForShap/include/python3.9', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])

run the example code here: https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/explainers/GPUTree.html I get this error:

>>> shap_values = explainer(X)
cuda extension was not built during install!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_tree.py", line 217, in __call__
    v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_gpu_tree.py", line 105, in shap_values
    assert_import("cext_gpu")
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/utils/_general.py", line 25, in assert_import
    raise e
  File "<stdin>", line 1, in <module>
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_tree.py", line 217, in __call__
    v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_gpu_tree.py", line 105, in shap_values
    assert_import("cext_gpu")
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/utils/_general.py", line 25, in assert_import
    raise e
  File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_gpu_tree.py", line 6, in <module>
    from .. import _cext_gpu
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/__init__.py)

Can you help me?

Thank you.