Closed JuseTiZ closed 7 months ago
Thanks for using LightGBM and for the detailed report. Sorry you're running into this.
Could please provide a few more details that'd help us to investigate this?
cmake ...
and make ...
It'd also help if you could make this example more minimal. For example:
reg_alpha
and just accept LightGBM's defaults)StratifiedKFold
or other splitting and perform every training run on the same datasetThose sorts of things would help to narrow down the source of the problem.
@jameslamb Thanks for your quick reply, I will provide the relevant information:
- type of GPU
R: NVIDIA GeForce RTX 3090. It works fine when training DNN or other ML models with GPU (like xgboost).
- specific operating system
$ uname -a
Linux master 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/*release
CentOS Linux release 7.5.1804 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.5.1804 (Core)
CentOS Linux release 7.5.1804 (Core)
- build logs from running
cmake ...
andmake ...
I removed the build
folder and ran the following command:
$ rm -rf build
$ mkdir build
$ cd build
$ cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 4.8.5
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /public/home/zj/mambaforge/envs/ncsvp/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "3.1")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for CL_VERSION_3_0
-- Looking for CL_VERSION_3_0 - found
-- Found OpenCL: /usr/local/cuda/lib64/libOpenCL.so (found version "3.0")
-- OpenCL include directory: /usr/local/cuda/include
-- Found Boost: /public/home/zj/mambaforge/envs/ncsvp/lib/cmake/Boost-1.78.0/BoostConfig.cmake (found suitable version "1.78.0", minimum required is "1.56.0") found components: filesystem system
-- Performing Test MM_PREFETCH
-- Performing Test MM_PREFETCH - Success
-- Using _mm_prefetch
-- Performing Test MM_MALLOC
-- Performing Test MM_MALLOC - Success
-- Using _mm_malloc
-- Configuring done (3.3s)
-- Generating done (0.1s)
-- Build files have been written to: /public/home/zj/tools/LightGBM/build
$ make -j4
[ 2%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt.cpp.o
[ 5%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt_model_text.cpp.o
[ 7%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/boosting.cpp.o
[ 10%] Building CXX object CMakeFiles/lightgbm_capi_objs.dir/src/c_api.cpp.o
[ 12%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt_prediction.cpp.o
[ 15%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/prediction_early_stop.cpp.o
[ 17%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/sample_strategy.cpp.o
[ 20%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/bin.cpp.o
[ 23%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/config.cpp.o
[ 25%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/config_auto.cpp.o
[ 28%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/dataset.cpp.o
[ 28%] Built target lightgbm_capi_objs
[ 30%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/dataset_loader.cpp.o
[ 33%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/file_io.cpp.o
[ 35%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/json11.cpp.o
[ 38%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/metadata.cpp.o
[ 41%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/parser.cpp.o
[ 43%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/train_share_states.cpp.o
[ 46%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/tree.cpp.o
[ 48%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/dcg_calculator.cpp.o
[ 51%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/metric.cpp.o
[ 53%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linker_topo.cpp.o
[ 56%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linkers_mpi.cpp.o
[ 58%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linkers_socket.cpp.o
[ 61%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/network.cpp.o
[ 64%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/objective_function.cpp.o
[ 66%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 69%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/feature_histogram.cpp.o
[ 71%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 74%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 76%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/gradient_discretizer.cpp.o
[ 79%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/linear_tree_learner.cpp.o
[ 82%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/serial_tree_learner.cpp.o
[ 84%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/tree_learner.cpp.o
[ 87%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 89%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/utils/openmp_wrapper.cpp.o
[ 89%] Built target lightgbm_objs
[ 92%] Linking CXX shared library /public/home/zj/tools/LightGBM/lib_lightgbm.so
[ 94%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
[ 97%] Building CXX object CMakeFiles/lightgbm.dir/src/main.cpp.o
[ 97%] Built target _lightgbm
[100%] Linking CXX executable /public/home/zj/tools/LightGBM/lightgbm
[100%] Built target lightgbm
$ cd ../
$ pip uninstall lightgbm
$ sh ./build-python.sh install --precompile
sh ./build-python.sh install --precompile
building lightgbm
Requirement already satisfied: build>=0.10.0 in /public/home/zj/mambaforge/envs/kaggle/lib/python3.10/site-packages (1.2.1)
Requirement already satisfied: packaging>=19.1 in /public/home/zj/mambaforge/envs/kaggle/lib/python3.10/site-packages (from build>=0.10.0) (24.0)
Requirement already satisfied: pyproject_hooks in /public/home/zj/mambaforge/envs/kaggle/lib/python3.10/site-packages (from build>=0.10.0) (1.0.0)
Requirement already satisfied: tomli>=1.1.0 in /public/home/zj/mambaforge/envs/kaggle/lib/python3.10/site-packages (from build>=0.10.0) (2.0.1)
found pre-compiled lib_lightgbm.so
--- building sdist ---
* Creating isolated environment: venv+pip...
* Installing packages in isolated environment:
- setuptools
* Getting build dependencies for sdist...
running egg_info
creating lightgbm.egg-info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.dll' under directory 'lightgbm'
adding license file 'LICENSE'
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
* Building sdist...
running sdist
running egg_info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.dll' under directory 'lightgbm'
adding license file 'LICENSE'
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
running check
creating lightgbm-4.3.0.99
creating lightgbm-4.3.0.99/lightgbm
creating lightgbm-4.3.0.99/lightgbm.egg-info
creating lightgbm-4.3.0.99/lightgbm/lib
copying files to lightgbm-4.3.0.99...
copying LICENSE -> lightgbm-4.3.0.99
copying MANIFEST.in -> lightgbm-4.3.0.99
copying README.rst -> lightgbm-4.3.0.99
copying pyproject.toml -> lightgbm-4.3.0.99
copying setup.cfg -> lightgbm-4.3.0.99
copying lightgbm/__init__.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/basic.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/callback.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/compat.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/dask.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/engine.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/libpath.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/plotting.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/py.typed -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/sklearn.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm.egg-info/PKG-INFO -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/SOURCES.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/dependency_links.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/requires.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/top_level.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm/lib/lib_lightgbm.so -> lightgbm-4.3.0.99/lightgbm/lib
copying lightgbm.egg-info/SOURCES.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
Writing lightgbm-4.3.0.99/setup.cfg
Creating tar archive
removing 'lightgbm-4.3.0.99' (and everything under it)
Successfully built lightgbm-4.3.0.99.tar.gz
--- installing lightgbm ---
Looking in links: .
Processing ./lightgbm-4.3.0.99.tar.gz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting numpy (from lightgbm)
Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 360.5 kB/s eta 0:00:00
Collecting scipy (from lightgbm)
Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.4/60.4 kB 1.6 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 1.7 MB/s eta 0:00:00
Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.4/38.4 MB 1.8 MB/s eta 0:00:00
Building wheels for collected packages: lightgbm
Building wheel for lightgbm (pyproject.toml) ... done
Created wheel for lightgbm: filename=lightgbm-4.3.0.99-py3-none-any.whl size=3277584 sha256=3752165c110132d19d5b44d124f9636055f64195ec12c3c91f1e600395bf68be
Stored in directory: /tmp/pip-ephem-wheel-cache-py4flilz/wheels/ab/ca/5d/8c248e7743594e1bd99a125aa24e0b01596f879dd6c7241e66
Successfully built lightgbm
Installing collected packages: numpy, scipy, lightgbm
Attempting uninstall: numpy
Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
Successfully uninstalled numpy-1.26.4
Attempting uninstall: scipy
Found existing installation: scipy 1.12.0
Uninstalling scipy-1.12.0:
Successfully uninstalled scipy-1.12.0
Successfully installed lightgbm-4.3.0.99 numpy-1.26.4 scipy-1.12.0
cleaning up
- does this happen with all combinations of the hyperparameters you're searching over, or only some subset? If some subset, could you provide just those subsets?
could you try other strategies to make this more minimal?
- remove parameters one by one and see if you still get the error (e.g., remove
reg_alpha
and just accept LightGBM's defaults)- remove
StratifiedKFold
or other splitting and perform every training run on the same dataset- remove computation of evaluation scores (since this error is happening at training time)
I've tried setting only the most basic parameters:
params = {
"metric": "rmse",
"verbosity": 2,
"device": "gpu",
"boosting_type": "gbdt",
}
lgb_train = lgb.Dataset(X, y)
model = lgb.train(
params, lgb_train, callbacks=[lgb.early_stopping(stopping_rounds=30)]
)
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 2612
[LightGBM] [Info] Number of data points in the train set: 94792, number of used features: 15
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.
The same when using the sklearn API:
params = {
"metric": "rmse",
"verbosity": 2,
"device": "gpu",
"boosting_type": "gbdt",
}
model = LGBMRegressor(**params)
model.fit(X, y)
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 2612
[LightGBM] [Info] Number of data points in the train set: 94792, number of used features: 15
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.
As before, replacing "device": "gpu"
with "device": "cpu"
makes it work properly.
params = {
"metric": "rmse",
"verbosity": 2,
"device": "cpu",
"boosting_type": "gbdt",
}
model = LGBMRegressor(**params)
model.fit(X, y)
[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.000043
[LightGBM] [Debug] init for col-wise cost 0.000014 seconds, init for row-wise cost 0.025067 seconds
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.104751 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2612
[LightGBM] [Info] Number of data points in the train set: 94792, number of used features: 15
[LightGBM] [Info] Start training from score 9.707233
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 6
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
......
Thanks for reporting this. If you are using a single NVIDIA GPU for training, could you please try with our new CUDA version instead of the legacy GPU version (with -DUSE_CUDA=ON instead of -DUSE_GPU=ON)? It should be faster. https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#id20
@shiyu1994 Cmake failed when using -DUSE_CUDA=ON
instead of -DUSE_GPU=ON
:
$ cmake -DUSE_CUDA=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ -DCMAKE_C_COMPILER=/public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc ..
-- The C compiler identification is GNU 12.1.0
-- The CXX compiler identification is GNU 4.8.5
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc
-- Check for working C compiler: /public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc - broken
CMake Error at /public/home/zj/tools/cmake-3.28.0-rc5-linux-x86_64/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
The C compiler
"/public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: '/public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-ukHP6p'
Run Build Command(s): /public/home/zj/tools/cmake-3.28.0-rc5-linux-x86_64/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_5ca6c/fast
/usr/bin/gmake -f CMakeFiles/cmTC_5ca6c.dir/build.make CMakeFiles/cmTC_5ca6c.dir/build
gmake[1]: Entering directory `/public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-ukHP6p'
Building C object CMakeFiles/cmTC_5ca6c.dir/testCCompiler.c.o
/public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc -march -o CMakeFiles/cmTC_5ca6c.dir/testCCompiler.c.o -c /public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-ukHP6p/testCCompiler.c
x86_64-conda-linux-gnu-cc: error: unrecognized command-line option '-march'
gmake[1]: *** [CMakeFiles/cmTC_5ca6c.dir/testCCompiler.c.o] Error 1
gmake[1]: Leaving directory `/public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-ukHP6p'
gmake: *** [cmTC_5ca6c/fast] Error 2
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:32 (project)
-- Configuring incomplete, errors occurred!
I tried to downgrade gcc
but this didn't help:
$ cmake -DUSE_CUDA=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ -DCMAKE_C_COMPILER=/public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc ..
-- The C compiler identification is GNU 8.5.0
-- The CXX compiler identification is GNU 4.8.5
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc
-- Check for working C compiler: /public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc - broken
CMake Error at /public/home/zj/tools/cmake-3.28.0-rc5-linux-x86_64/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
The C compiler
"/public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: '/public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-w5DwZ5'
Run Build Command(s): /public/home/zj/tools/cmake-3.28.0-rc5-linux-x86_64/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_88666/fast
/usr/bin/gmake -f CMakeFiles/cmTC_88666.dir/build.make CMakeFiles/cmTC_88666.dir/build
gmake[1]: Entering directory `/public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-w5DwZ5'
Building C object CMakeFiles/cmTC_88666.dir/testCCompiler.c.o
/public/home/zj/mambaforge/envs/kaggle/bin/x86_64-conda-linux-gnu-cc -march -o CMakeFiles/cmTC_88666.dir/testCCompiler.c.o -c /public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-w5DwZ5/testCCompiler.c
x86_64-conda-linux-gnu-cc: error: unrecognized command line option '-march'; did you mean '-march='?
gmake[1]: *** [CMakeFiles/cmTC_88666.dir/testCCompiler.c.o] Error 1
gmake[1]: Leaving directory `/public/home/zj/tools/LightGBM/build/CMakeFiles/CMakeScratch/TryCompile-w5DwZ5'
gmake: *** [cmTC_88666/fast] Error 2
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:32 (project)
-- Configuring incomplete, errors occurred!
Is it because my gcc version is still wrong, or should I modify some files?
I removed the -march
in CMakeCache.txt
and installed CUDA version.
Replacing "device": "gpu"
with "device": "cuda"
makes lightgbm work well on GPU and was significantly accelerated.
Thanks for the advice.
Description
Kernel crash occurs in Jupyter Notebook when running LightGBM with GPU support enabled on a small dataset (~5MB). This issue arises on a remote Linux server, not on a local setup.
Reproducible example
The following is related code:
Output:
Jupyter notebook log does not have very valuable information:
The kernel crash happens specifically when the
'device': 'gpu'
parameter is set in the LightGBM configuration. Disabling GPU support allows the code to run correctly.Environment info
LightGBM version:
I followed the documentation to install LightGBM with GPU Support:
The issue seems related specifically to GPU utilization. Attempts to adjust
gpu_device_id
andgpu_platform_id
settings did not resolve the problem. Is there a recommended approach to debug or fix this, or might there have been a misstep in the GPU installation or compilation process?