Closed jmakov closed 5 months ago
@jmakov Is it possible to see more error message? For example, why the compilation of cuda_best_split_finder.cu fail?
@shiyu1994 there seems to be only 1 type of error:
/tmp/lib/LightGBM/include/LightGBM/utils/../../../external_libs/fmt/include/fmt/format-inl.h(85): here
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with '...':
435 | function(_Functor&& __f)
|
^
/usr/include/c++/11/bits/std_function.h:435:145: note: '_ArgTypes'
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with '...':
530 | operator=(_Functor&& __f)
|
^
/usr/include/c++/11/bits/std_function.h:530:146: note: '_ArgTypes'
whole log: build_fail.txt
This is kinda a blocker for me. Would be great to have some more insight into what can be done about it.
I've been having similar problems I think when trying to install v4.0. Builds were failing until I switched gcc (and g++ for good measure) to version 10 for compiling.
Found solution from this reference: https://github.com/NVIDIA/nccl/issues/650
Sorry for the long delay in response. I believe recent changes in LightGBM have fixed this.
I was able to build latest LightGBM (https://github.com/microsoft/LightGBM/commit/14435485bd7e9c4d72ab43bd269c33fc4230212f) in the latest stable rapidsai/base
image.
(rapidsai/rapidsai-core
images were removed as part of https://github.com/rapidsai/docker/issues/539)
docker run \
--rm \
--user root \
-it rapidsai/base:24.04-cuda12.0-py3.10 \
bash
mkdir /tmp/lib
cd /tmp/lib
# install build tools (rapidsai/core doesn't ship these)
apt-get update
apt-get install -y \
build-essential \
cmake \
git
# build LightGBM
git clone --recursive https://github.com/microsoft/LightGBM
cd ./LightGBM
cmake -B build -S . -DUSE_CUDA=1
cmake --build build --target _lightgbm -j2
sh build-python.sh install --precompile
That built successfully for me.
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!
Sorry for the long delay in response. I believe recent changes in LightGBM have fixed this.
I was able to build latest LightGBM (1443548) in the latest stable
rapidsai/base
image.(
rapidsai/rapidsai-core
images were removed as part of rapidsai/docker#539)docker run \ --rm \ --user root \ -it rapidsai/base:24.04-cuda12.0-py3.10 \ bash mkdir /tmp/lib cd /tmp/lib # install build tools (rapidsai/core doesn't ship these) apt-get update apt-get install -y \ build-essential \ cmake \ git # build LightGBM git clone --recursive https://github.com/microsoft/LightGBM cd ./LightGBM cmake -B build -S . -DUSE_CUDA=1 cmake --build build --target _lightgbm -j2 sh build-python.sh install --precompile
That built successfully for me.
full logs (click me)
Wondering if it's possible to enforce architecture somehow. Trying to reproduce your commands on NVIDIA RTX 6000 Ada (SM 8.9) & CUDA Version: 12.4, Ubuntu 20.04.6 LTS leads to
$ "/usr/bin"/c++ -D__CUDA_ARCH__=300 -E -x c++
-DCUDA_DOUBLE_MATH_FUNCTIONS -DCUDACC -DNVCC -DCUDACC_VER_MAJOR=10 -DCUDACC_VER_MINOR=1 -DCUDACC_VER_BUILD=243 -include "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii"
$ cicc --c++14 --gnu_version=80400 --allow_managed -arch compute_30 -m64
-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library "/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name "CMakeCUDACompilerId.cu" --gen_c_file_name "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o "tmp/CMakeCUDACompilerId.ptx"
$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o
"tmp/CMakeCUDACompilerId.sm_30.cubin"
ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name'
Nevermind. I had to remove nvidia-cuda-toolkit (which I installed 'cause it allowed open CL version of lightgbm to work, only to find out it's buggy on big datasets and overall an abandoned branch).
Currently stuck at
found pre-compiled lib_lightgbm.so --- building sdist --- build-python.sh: 347: python: not found
Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command ;-)
build-python.sh: 347: python: not found
You have to have Python installed and a python
executable available on PATH
to build LightGBM's Python package.
I strongly suspect that you aren't using the exact example I provided in https://github.com/microsoft/LightGBM/issues/5785#issuecomment-2073988733, but you haven't described your setup here so it's not possible to help much more.
Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command
We're doing the best we can with a much smaller amount of maintainer availability. Those projects both have multiple maintainers being paid to work on them full-time... LightGBM does not.
You're welcome to come contribute here any time.
build-python.sh: 347: python: not found
You have to have Python installed and a
python
executable available onPATH
to build LightGBM's Python package.I strongly suspect that you aren't using the exact example I provided in #5785 (comment), but you haven't described your setup here so it's not possible to help much more.
Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command
We're doing the best we can with a much smaller amount of maintainer availability. Those projects both have multiple maintainers being paid to work on them full-time... LightGBM does not.
You're welcome to come contribute here any time.
Yeah, I know. Thanks a lot for your hard work, guys. I hope getting an easier access to GPU training is on the roadmap. Not experienced myself in that, otherwise would contribute for sure.
Description
https://github.com/microsoft/LightGBM/issues/5089 is marked as resolved but this is still the case trying to build in RAPIDS Docker container:
Reproducible example
Environment info
LightGBM version or commit hash:
Command(s) you used to install LightGBM
Build in docker
FROM rapidsai/rapidsai-core:23.02-cuda11.8-runtime-ubuntu22.04-py3.10
GCC 11.3Additional Comments