Open mirekphd opened 3 years ago
Here's an argument that the current compilation instructions are barely scratching the surface: currently (i.e. for the latest 2 commits to master
) CUDA compilation failed in my tests in all CUDA docker containers I tried (the official nvidia/cuda
as well as custom ones), with CUDA versions ranging from 10.1 to 11.1, and gcc
builds from 8 to 10 (appropriately matched).
Two issues I consistently encountered that invariably led nvcc
to fail ("WARNING: Could not compile cuda extensions") were:
--extended-lambda
here: https://github.com/slundberg/shap/blob/5831d55b598bf17b63fd4c81f9799c4d771d0bc7/setup.py#L69 which seems to be unavailable for older CUDA versions (like 10.1), but replacement with its alias --expt-extended-lambda
seems to work,error: namespace "std" has no member "size"
in the following 6 lines:
shap/cext/_cext_gpu.cu(178): error: namespace "std" has no member "size"
shap/cext/_cext_gpu.cu(179): error: no instance of function template "flat_idx_to_tensor_idx" matches the argument list
argument types are: (size_t, size_t [3],
shap/cext/_cext_gpu.cu(183): error: no instance of function template "tensor_idx_to_flat_idx" matches the argument list
argument types are: (size_t [3],
shap/cext/_cext_gpu.cu(225): error: namespace "std" has no member "size"
shap/cext/_cext_gpu.cu(226): error: no instance of function template "flat_idx_to_tensor_idx" matches the argument list
argument types are: (size_t, size_t [4],
shap/cext/_cext_gpu.cu(231): error: no instance of function template "tensor_idx_to_flat_idx" matches the argument list
argument types are: (size_t [4],
6 errors detected in the compilation of "shap/cext/_cext_gpu.cu". Error building cuda module: CalledProcessError(1, ['/usr/local/cuda/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/usr/include/python3.8', '--extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_60', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']) WARNING: Could not compile cuda extensions
Just for fun, I tried to specify `"--compiler-options -std=c++17` but it only worsened the problem.
Hi, were you able to install with gpu support? I was interested in trying that out as well, but got the error below as well. I was installing from the repo and setting the CUDA_PATH
environment variable to /usr/local/cuda/bin/nvcc
record_import_error("cext_gpu", "cuda extension was not built during install!", e)
Hi, were you able to install with gpu support? I was interested in trying that out as well, but got the error below as well. I was installing from the repo and setting the
CUDA_PATH
environment variable to/usr/local/cuda/bin/nvcc
No, failed due to the above compilation errors, which I hope can be fixed by developers (gently ping @RAMitchell ).
In your case, have you tried export CUDA_PATH=/usr/local/cuda
(without /bin/nvcc
)? The last subfolder and compiler part is already hard-coded here (which hopefully will be documented): https://github.com/slundberg/shap/blob/5831d55b598bf17b63fd4c81f9799c4d771d0bc7/setup.py#L63
Sorry we are working through a few issues, can you try pr #1646?
So after PR #1646 was merged in (thank you @RAMitchell and @JohnZed ), I tested GPU compilation from source on Ubuntu using these two setups. The errors reported above were resolved only for the latest CUDA, but for legacy version compilation failed for an unknown reason. Tried it also in custom containers, but for reproducibility used the official ones.
More info
a) latest: CUDA 11.1 (official nvidia/cuda
container with Ubuntu 20.4, python
3.8.5 and gcc
9.3.0).:
nvcc
compilation: success (long wait without any progress bar, because nvcc
uses single core, which I suppose cannot be helped). The import of shap
in python3.8 succeeded, but raised this warning:
"is not" with a literal. Did you mean "!="?
shap.GPUTreeExplainer
was available and should work (not tested it further, only CUDA 10.1 could be tested functionally).
b) legacy: CUDA 10.1 ( official nvidia/cuda
container with Ubuntu 18.4, python
3.8.0, and gcc
upgraded to 8.4.0 - the maximum version supported by CUDA 10.1, which I installed as /usr/local/cuda/bin/gcc
):
nvcc
compilation: FAILURE (with a non-specific error "ar: build/lib_cext_gpu.a: No such file or directory"):
NVCC ==> /usr/local/cuda/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/usr/local/cuda/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/usr/include/python3.8', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_60', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
ar: build/lib_cext_gpu.a: No such file or directory
Error building cuda module: CalledProcessError(1, ['/usr/local/cuda/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/usr/include/python3.8', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_60', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
WARNING: Could not compile cuda extensions
Code to reproduce CUDA 10.1 failure:
$ docker run --rm -it --name cuda101 -u 0 nvidia/cuda:10.1-devel-ubuntu18.04 bash
# install maximum supported gcc compiler
MAX_GCC_VERSION=8 # for CUDA 10.1,10.2
apt update && apt install -y gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
# add symlinks to max supported gcc inside a CUDA subfolder
ln -s /usr/bin/gcc-$MAX_GCC_VERSION /usr/local/cuda/bin/gcc
ln -s /usr/bin/g++-$MAX_GCC_VERSION /usr/local/cuda/bin/g++
# specify location of nvcc for shap compilation
export CUDA_PATH=/usr/local/cuda
# install python 3.8 and setuptools
apt install -y git python3.8 python3.8-dev python3-setuptools
# compile shap with GPU support
cd /tmp && \
git clone https://github.com/slundberg/shap && \
cd shap && \
git checkout 5209472 && \
python3.8 setup.py install
The problem is actually that the build directory doesn't exist yet, so running it again results in successful compilation. I will submit a fix.
So the error message was specific after all!:) I had the nvidia/cuda
11.1 container ready, preserved from previous tests, so this is why it seemed to work right away (it's not due to new and improved CUDA version:)
I can now confirm the above-defined success in our custom-build container with CUDA 10.1 (mirekphd/ml-gpu-py38-cuda101-cust:latest
). nvcc
compilation stage now passes even for CUDA 10.1 and so does python install
stage. When I import shap
, it does succeed, but prints out these warnings that were also printed out during python install (with specific lines of code raising them):
>>> import shap
"is not" with a literal. Did you mean "!="?
"is" with a literal. Did you mean "=="?
"is" with a literal. Did you mean "=="?
"is not" with a literal. Did you mean "!="?
I've created some scripts for automatically building/testing using docker: https://github.com/RAMitchell/shap-ci-scripts
I'm currently testing cuda 10.1 and cuda 11.1 with ubuntu 18.04, and everything is working well with the #1665 fix. There should be no need to set paths, setup.py should find nvcc.
Hopefully we can get this running regularly on a CI machine.
Thank you for the Dockerfile! You could even parametrize the Ubuntu version year here (to test also 20.04 and possibly 16.04)
I did not realize the list of shap's requirements is so long (here and here) in case of source build... until I tried to complete it in a bare bones container such as nvidia/cuda
:)
And @RAMitchell is right that the python install stage would fail without these python dependencies... I wonder @slundberg if perhaps requiring the presence of so many extra python packages is necessary for shap installation (their installation may fail e.g. when there are multiple python versions as seen in my sample code above based on nvidia/cuda
with older Ubuntu 18.04),
Perhaps the dependencies can be specified explicitly:
pip install -r requirements.txt
(I do hope they could be pip-installed for those not using conda)?
I've created some scripts for automatically building/testing using docker: https://github.com/RAMitchell/shap-ci-scripts
I'd change one thing in the Dockerfile though: I'd install python packages as a standard user, not root (by adding USER 1000
somewhere here and returning to USER root
before shap
compilation from source... then back to USER 1000
at the very end of file). The reason is that Installing popular python packages as root makes changes to their reverse dependencies practically impossible in the future for the container user (pip
would sometimes require uninstalling dependency A to accommodate an older package B that requires A but pins it at a version older than was already installed for package C [here C is shap
and A can be numpy
or scipy
]). And you can't uninstall anything as a standard user if it was installed as root (as part of shap
's sumptuous dependencies package that needs to get pre-installed or else it gets installed automatically :)
In fact I managed to integrate GPU-enabled shap
(using the duplicate python setup.py install
workaround:) into one of our data science containers only after having moved its installation to the very end of the Dockerfile. Otherwise (if it preceded any other python package installation) it would break the installation of at least one other package due to the above root requirements problem (even though I preinstalled all shap
's python dependencies as s standard user, shap
still apparently tried to down- or upgrade some of them at build time (when having root)).
But now it's part of one of our CICD pipelines and ready for testing:
docker run --rm -d --name ml-gpu-py38-cuda101-cust -p 8888:8888 --gpus all mirekphd/ml-gpu-py38-cuda101-cust && docker logs -f ml-gpu-py38-cuda101-cust
Did our container work for you?:) In my tests the successful CUDA compilation made little to no difference... I'm testing it on a corporate server with a proven GPU access (e.g. XGBoost trains 10x faster with tree_method="gpu_hist"
). Maybe the workaround of duplicate python setup.py install
did not work as intended..?
When running:
shap_values_valid = explainer.shap_values(X=valid_x, y=valid_y)
...I'm getting this error as previously:
cuda extension was not built during install!
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-22-cbd558096e13> in <module>
1 if OBJECTIVE == "binary":
----> 2 shap_values_valid = explainer.shap_values(X=valid_x, y=valid_y)
3
4 else:
5 shap_values_valid = explainer.shap_values(X=valid_x)
/opt/conda/lib/python3.8/site-packages/shap/explainers/_gpu_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
113
114 # run the core algorithm using the C extension
--> 115 assert_import("cext_gpu")
116 phi = np.zeros((X.shape[0], X.shape[1] + 1, self.model.num_outputs))
117 _cext_gpu.dense_tree_shap(
/opt/conda/lib/python3.8/site-packages/shap/utils/_general.py in assert_import(package_name)
21 msg,e = import_errors[package_name]
22 print(msg)
---> 23 raise e
24
25 def record_import_error(package_name, msg, e):
[... skipping hidden 1 frame]
<ipython-input-21-cbd558096e13> in <module>
1 if OBJECTIVE == "binary":
----> 2 shap_values_valid = explainer.shap_values(X=valid_x, y=valid_y)
3
4 else:
5 shap_values_valid = explainer.shap_values(X=valid_x)
/opt/conda/lib/python3.8/site-packages/shap/explainers/_gpu_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
113
114 # run the core algorithm using the C extension
--> 115 assert_import("cext_gpu")
116 phi = np.zeros((X.shape[0], X.shape[1] + 1, self.model.num_outputs))
117 _cext_gpu.dense_tree_shap(
/opt/conda/lib/python3.8/site-packages/shap/utils/_general.py in assert_import(package_name)
21 msg,e = import_errors[package_name]
22 print(msg)
---> 23 raise e
24
25 def record_import_error(package_name, msg, e):
/opt/conda/lib/python3.8/site-packages/shap/explainers/_gpu_tree.py in <module>
4 from ..utils import assert_import, record_import_error
5 try:
----> 6 from .. import _cext_gpu
7 except ImportError as e:
8 record_import_error("cext_gpu", "cuda extension was not built during install!", e)
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/opt/conda/lib/python3.8/site-packages/shap/__init__.py)
The lib certainly does exist and .py file is readable and .so - executable, so probably they are not where they should be:)
$ cd / && find | grep cext_gpu
[..]
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/__pycache__/_cext_gpu.cpython-38.pyc
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py
$ cat /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py
def __bootstrap__():
global __bootstrap__, __loader__, __file__
import sys, pkg_resources, importlib.util
__file__ = pkg_resources.resource_filename(__name__, '_cext_gpu.cpython-38-x86_64-linux-gnu.so')
__loader__ = None; del __bootstrap__, __loader__
spec = importlib.util.spec_from_file_location(__name__,__file__)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
__bootstrap__()
$ ls -lan /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
-rwxr-xr-x. 1 0 100 9756120 Dec 17 20:53 /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
Does it perhaps need access to _cext_gpu.o
? But it is not in python's site-packages
, but where it was built from sources...:
$ cd / && find | grep _cext_gpu
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/__pycache__/_cext_gpu.cpython-38.pyc
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
./opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py
[..]
./usr/local/src/shap/build/lib.linux-x86_64-3.8/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so
./usr/local/src/shap/build/lib_cext_gpu.a
./usr/local/src/shap/build/temp.linux-x86_64-3.8/shap/cext/_cext_gpu.o
./usr/local/src/shap/shap/cext/_cext_gpu.cc
./usr/local/src/shap/shap/cext/_cext_gpu.cu
It should only need _cext_gpu.cpython-38-x86_64-linux-gnu.so
as far as I know.
Ok, then may be this is part is relevant:
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/opt/conda/lib/python3.8/site-packages/shap/__init__.py)
So I also checked dependencies and they seem OK (not conflicting with requirements):
shap==0.37.0
- numba [required: Any, installed: 0.52.0]
- llvmlite [required: >=0.35.0,<0.36, installed: 0.35.0]
- numpy [required: >=1.15, installed: 1.19.4]
- setuptools [required: Any, installed: 51.0.0]
- numpy [required: Any, installed: 1.19.4]
- pandas [required: Any, installed: 1.1.5]
- numpy [required: >=1.15.4, installed: 1.19.4]
- python-dateutil [required: >=2.7.3, installed: 2.8.1]
- six [required: >=1.5, installed: 1.15.0]
- pytz [required: >=2017.2, installed: 2020.4]
- scikit-learn [required: Any, installed: 0.23.2]
- joblib [required: >=0.11, installed: 1.0.0]
- numpy [required: >=1.13.3, installed: 1.19.4]
- scipy [required: >=0.19.1, installed: 1.5.4]
- numpy [required: >=1.14.5, installed: 1.19.4]
- threadpoolctl [required: >=2.0.0, installed: 2.1.0]
- scipy [required: Any, installed: 1.5.4]
- numpy [required: >=1.14.5, installed: 1.19.4]
- slicer [required: ==0.0.3, installed: 0.0.3]
- tqdm [required: >4.25.0, installed: 4.54.1]
The package folder listing with permissions:
$ ls -lan /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/
total 10104
drwxr-sr-x. 11 0 100 4096 Dec 17 20:53 .
drwxr-sr-x. 4 0 100 34 Dec 17 20:53 ..
drwxr-sr-x. 3 0 100 83 Dec 17 20:53 actions
drwxr-sr-x. 3 0 100 177 Dec 17 20:53 benchmark
drwxr-sr-x. 2 0 100 25 Dec 17 20:53 cext
-rwxr-xr-x. 1 0 100 516176 Dec 17 20:53 _cext.cpython-38-x86_64-linux-gnu.so
-rwxr-xr-x. 1 0 100 9756120 Dec 17 20:53 _cext_gpu.cpython-38-x86_64-linux-gnu.so
-rw-r--r--. 1 0 100 434 Dec 17 20:53 _cext_gpu.py
-rw-r--r--. 1 0 100 430 Dec 17 20:53 _cext.py
-rw-r--r--. 1 0 100 8798 Dec 17 20:53 datasets.py
drwxr-sr-x. 5 0 100 4096 Dec 17 20:53 explainers
-rw-r--r--. 1 0 100 25320 Dec 17 20:53 _explanation.py
-rw-r--r--. 1 0 100 2809 Dec 17 20:53 __init__.py
-rw-r--r--. 1 0 100 442 Dec 17 20:53 links.py
drwxr-sr-x. 3 0 100 179 Dec 17 20:53 maskers
drwxr-sr-x. 3 0 100 122 Dec 17 20:53 models
drwxr-sr-x. 5 0 100 4096 Dec 17 20:53 plots
drwxr-sr-x. 2 0 100 191 Dec 17 20:53 __pycache__
drwxr-sr-x. 3 0 100 192 Dec 17 20:53 utils
$ ls -lan /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/cext/
total 64
drwxr-sr-x. 2 0 100 25 Dec 17 20:53 .
drwxr-sr-x. 11 0 100 4096 Dec 17 20:53 ..
-rw-r--r--. 1 0 100 58130 Dec 17 20:53 tree_shap.h
I think I will have to create a local copy of the package (from this root-owned folder), import it and set up a breakpoint to examine the error in more detail (but that's tomorrow:).
I implemented a workaround for yet another likely reason for the "cuda extension was not built during install!" error, but it was not helpful in this case.
More info
In our moderately complex python environment we have several meta-packages that require shap
and these had to be installed first (due to the implicit root requirement of packages compiled from source like shap
that I already mentioned earlier). As a result we may end up with two versions of shap
: a CPU-only one installed by pip
as a dependency of those meta-packages and the GPU-enabled I complied here (using procedure described above). And in spite of the installation sequence, the CPU version (installed first) takes priority over GPU one (installed second and in fact - last) when you execute import shap
. This situation can be detected by listing all shap
dependenies like this:
$ pipdeptree -r -p shap
[..]
shap==0.37.0
- alibi==0.5.5 [requires: shap>=0.36]
- BorutaShap==1.0.14 [requires: shap>=0.34.0]
- causalml==0.8.0 [requires: shap]
- explainerdashboard==0.2.16.1 [requires: shap>=0.36]
So a quick workaround was to uninstall all of the above meta-packages - reverse dependencies of shap
.
Another possible reason for failure in case of secured setups is shap
's strong assumption about user ID (root requirement is a special case). For instance if we installed a python package with user ID 1000 (as usual in our Dockerfiles
), and the package would later (at run time) try to write down some temp download data (the download attempt itself would fail even faster on air-gapped servers by the way) to a package installation subfolder (rather than to a predictably writable location such as /tmp
), the execution attempt would end before it really started - like in the 2nd cell of this shap lightgbm example notebook. We don't need no adult.data
here, right?:)
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
<ipython-input-2-cde0eb234dfd> in <module>
----> 1 X,y = shap.datasets.adult()
2 X_display,y_display = shap.datasets.adult(display=True)
3
4 # create a train/test split
5 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)
/opt/conda/lib/python3.8/site-packages/shap/datasets.py in adult(display)
110 ]
111 raw_data = pd.read_csv(
--> 112 cache(github_data_url + "adult.data"),
113 names=[d[0] for d in dtypes],
114 na_values="?",
/opt/conda/lib/python3.8/site-packages/shap/datasets.py in cache(url, file_name)
247 data_dir = os.path.join(os.path.dirname(__file__), "cached_data")
248 if not os.path.isdir(data_dir):
--> 249 os.mkdir(data_dir)
250
251 file_path = os.path.join(data_dir, file_name)
PermissionError: [Errno 13] Permission denied: '/opt/conda/lib/python3.8/site-packages/shap/cached_data'
To make the download in shap
's datasets.py
work we would need (apart from moving to a sandbox with access to the whole internet) run our data science container with maching user ID (the same UID as the installation-time UID), which would never happen in case of secure on-prem platforms like Openshift for example.
The installation time UID of 1000:
$ ls -lan /opt/conda/lib/python3.8/site-packages/shap/
total 660
drwxr-sr-x 11 1000 100 4096 Dec 18 13:47 .
drwsrwsr-x 1 1000 100 4096 Dec 18 13:49 ..
drwxr-sr-x 3 1000 100 4096 Dec 18 13:47 actions
drwxr-sr-x 3 1000 100 4096 Dec 18 13:47 benchmark
drwxr-sr-x 2 1000 100 4096 Dec 18 13:47 cext
-rwxr-xr-x 1 1000 100 578232 Dec 18 13:47 _cext.cpython-38-x86_64-linux-gnu.so
-rw-r--r-- 1 1000 100 8798 Dec 18 13:47 datasets.py
drwxr-sr-x 5 1000 100 4096 Dec 18 13:47 explainers
-rw-r--r-- 1 1000 100 25320 Dec 18 13:47 _explanation.py
-rw-r--r-- 1 1000 100 2809 Dec 18 13:47 __init__.py
-rw-r--r-- 1 1000 100 442 Dec 18 13:47 links.py
drwxr-sr-x 3 1000 100 4096 Dec 18 13:47 maskers
drwxr-sr-x 3 1000 100 4096 Dec 18 13:47 models
drwxr-sr-x 5 1000 100 4096 Dec 18 13:47 plots
drwxr-sr-x 2 1000 100 4096 Dec 18 13:47 __pycache__
drwxr-sr-x 3 1000 100 4096 Dec 18 13:47 utils
... vs. run-time UID (simulating random UID from a wide range that would be alloted to a running pod under Openshift):
$ id
uid=1000070000(jovyan) gid=0(root) groups=0(root)
You may be pleased to know @RAMitchell that I have found the reason behind the error (ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/opt/conda/lib/python3.8/site-packages/shap/__init__.py
).
What happens is that not all files are extracted / copied to shap package main folder from .egg. Notably these two files were missing (and when copied manually immediately repaired the broken import):
cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py /opt/conda/lib/python3.8/site-packages/shap/
cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so /opt/conda/lib/python3.8/site-packages/shap/
Could we prevent this from happening?
So after several workarounds described above (which probably merit some attention?) I made GPU-enabled shap
work on small datasets in the sandbox environment.
The performance gains are substantial - nearly 15 times even on small datasets with 10k rows - see shap-lgbm-gpu-test.zip based on SHAP's extensive examples library. Thank you @RAMitchell for this much needed work! Such performance boost is no mean feat at all, as GBDT algos themselves would require 10M not just 10k rows datasets (e.g. the Higgs boson dataset) to register any performance improvements in the GPU.
Memory management and accuracy issues (which are present in so many other GPU implementations, notably TF anv XGB, respectively) will be reported in separate issues - let's find time to solve the remaining 20% on the Pareto curve as well:)
Here's a simple reproducible performance comparison test that can be replicated by anyone with docker, GPU and CUDA 10.1+ compatible display driver. The timings reported here are for a low-spec dev machine with GeForce GTX 1080Ti and CUDA 11.0 driver (vs. single core of Intel i5-4690K):
docker run -it --rm --name test-shap-gpu --gpus=all -v $(pwd):/home/jovyan mirekphd/ml-gpu-py38-cuda101-cust:latest python shap-lgbm-gpu-test.py
[..]
SHAP values in CPU were estimated in 29 sec.
SHAP values in GPU were estimated in 2 sec.
[..]
Note: the example above assumes that the test script shap-lgbm-gpu-test.zip has been downloaded to the current folder and unzipped there. For enabling GPU access to docker containers see the Prerequisites section in my answer here.
Awesome work. I'm on vacation for a week and will address the last installation issues when I'm back.
So here are my results of more realistic performance comparison tests. In a nutshell, moving to the GPU shortened the process of estimating SHAP values from minutes to seconds and from hours to minutes!
'''
SHAP values estimation times (for a LightGBM classifier
with ca. 300 features) as a function of data rows (10k, 100k, 200k, 400k, 800k);
note that LightGBM v3.1.1 and above has CUDA implementaion of training
and cv functions, hence `device` was set as either "cpu"
or "gpu" (the latter denoted as "GPU(LGBM)"), but using GPU in LightGBM
had a slightly negative impact on SHAP values calculation in the GPU:
10k:
CPU(SHAP)+CPU(LGBM): executed in 1m 14.2s
GPU(SHAP)+CPU(LGBM): executed in 1.86s
100k:
CPU(SHAP)+CPU(LGBM): executed in ca. 1 hour (verified)
GPU(SHAP)+CPU(LGBM): executed in 52.8s
200k:
CPU(SHAP)+CPU(LGBM): executed in ca. 2 hours (verified)
GPU(SHAP)+CPU(LGBM): executed in 2m 1s
400k:
CPU(SHAP)+CPU(LGBM): ca. 4 hours (estimate)
GPU(SHAP)+CPU(LGBM): executed in 6m 33s
GPU(SHAP)+GPU(LGBM): executed in 7m 22s
800k:
CPU(SHAP)+CPU(LGBM): ca. 8 hours (estimate)
GPU(SHAP)+CPU(LGBM): executed in 19m 17s
GPU(SHAP)+GPU(LGBM): executed in 22m 43s
'''
Great job! Can this be used to calculate the interaction Shap values?
Great job! Can this be used to calculate the interaction Shap values?
Yes, with the tree path dependent algorithm.
Hi @mirekphd ,
I read all this log. I meet the same issue:
cuda extension was not built during install!
Traceback (most recent call last):
File "/home/mike-ubuntu/test.py", line 13, in <module>
shap_values = explainer(X)
File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/explainers/_tree.py", line 217, in __call__
v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/explainers/_gpu_tree.py", line 105, in shap_values
assert_import("cext_gpu")
File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/utils/_general.py", line 25, in assert_import
raise e
File "/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/explainers/_gpu_tree.py", line 6, in <module>
from .. import _cext_gpu
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/home/mike-ubuntu/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap/__init__.py)
The I found this comment:
cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.py /opt/conda/lib/python3.8/site-packages/shap/
cp /opt/conda/lib/python3.8/site-packages/shap-0.37.0-py3.8-linux-x86_64.egg/shap/_cext_gpu.cpython-38-x86_64-linux-gnu.so /opt/conda/lib/python3.8/site-packages/shap/
I go to my shap
to check:
(rapids-22.10) mike-ubuntu@DESKTOP-14KME2Q:~/anaconda3/envs/rapids-22.10/lib/python3.9/site-packages/shap$ ls
__init__.py __pycache__ _cext.cpython-39-x86_64-linux-gnu.so _explanation.py _serializable.py actions benchmark cached_data cext datasets.py explainers links.py maskers models plots utils
There is _cext.cpython-39-x86_64-linux-gnu.so
, but never did I got _cext_gpu.py
, as you mentioned.
How to get it?
Thank you.
Hi, @RAMitchell and @mirekphd
I read most of this log. I confirmed gcc
version matched my nvcc version:
(ForShap) mike@mike-desktop:~/shap$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(ForShap) mike@mike-desktop:~/shap$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
(ForShap) mike@mike-desktop:~/shap$ echo $CUDA_PATH
/usr
However, I still get some error:
(ForShap) mike@mike-desktop:~/shap$ python setup.py install --user
NVCC ==> /usr/bin/nvcc
Compiling cuda extension, calling nvcc with arguments:
['/usr/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/home/mike/anaconda3/envs/ForShap/include/python3.9', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75']
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
Error building cuda module: CalledProcessError(1, ['/usr/bin/nvcc', 'shap/cext/_cext_gpu.cu', '-lib', '-o', 'build/lib_cext_gpu.a', '-Xcompiler', '-fPIC', '-I/home/mike/anaconda3/envs/ForShap/include/python3.9', '--std', 'c++14', '--expt-extended-lambda', '--expt-relaxed-constexpr', '-arch=sm_37', '-gencode=arch=compute_37,code=sm_37', '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'])
run the example code here: https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/explainers/GPUTree.html I get this error:
>>> shap_values = explainer(X)
cuda extension was not built during install!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_tree.py", line 217, in __call__
v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_gpu_tree.py", line 105, in shap_values
assert_import("cext_gpu")
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/utils/_general.py", line 25, in assert_import
raise e
File "<stdin>", line 1, in <module>
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_tree.py", line 217, in __call__
v = self.shap_values(X, y=y, from_call=True, check_additivity=check_additivity, approximate=self.approximate)
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_gpu_tree.py", line 105, in shap_values
assert_import("cext_gpu")
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/utils/_general.py", line 25, in assert_import
raise e
File "/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/explainers/_gpu_tree.py", line 6, in <module>
from .. import _cext_gpu
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/home/mike/.local/lib/python3.9/site-packages/shap-0.41.0-py3.9-linux-x86_64.egg/shap/__init__.py)
Can you help me?
Thank you.
When invoking:
shap_values_train = explainer.shap_values(X=train_x, y=train_y)
I'm getting this error raised here https://github.com/slundberg/shap/blob/6af9e1008702fb0fab939bf2154bbf93dfe84a16/shap/explainers/_gpu_tree.py#L8 as a result of missing CUDA extension_cext_gpu.lib
, which should have been compiled bycompile_cuda_module
if I had theCUDA_PATH
env var defined at pip-installation time (as part ofdocker build
in my case).Can we have a more specific error message (telling user exactly what to do, complete with an example path for a given location of
nvcc
)?I did notice this remark in the docs on shap.GPUTreeExplainer, : _"Currently requires source build with cuda available and ‘CUDAPATH’ environment variable defined." but it was not clear enough about the role of CUDA_PATH (just having it created obviously was not enough).
A more specific error message would point me to the exact solution even faster, without looking into your source code:)
Regarding the "source build' part there could be also some clarification on how to proceed, e.g. ideally in a form of exact compilation instructions or a
Dockerfile
like this one: dockerfile.gpu.Having an exact compilation example is rather important, because the path to
nvcc
looks like this:/usr/local/cuda-10.1/bin/nvcc
so there are at least 4 ways to define the path:And here only the last two would work given how the variable is used: https://github.com/slundberg/shap/blob/5831d55b598bf17b63fd4c81f9799c4d771d0bc7/setup.py#L63