microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.66k stars 3.83k forks source link

Build fails for `-DUSE_CUDA=1` #5785

Closed jmakov closed 5 months ago

jmakov commented 1 year ago

Description

https://github.com/microsoft/LightGBM/issues/5089 is marked as resolved but this is still the case trying to build in RAPIDS Docker container:

#0 153.8 /usr/include/c++/11/bits/std_function.h:435:145: note:         '_ArgTypes'
#0 153.8 /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with '...':
#0 153.8   530 |         operator=(_Functor&& __f)
#0 153.8       |                                                                                                                                                  ^ 
#0 153.8 /usr/include/c++/11/bits/std_function.h:530:146: note:         '_ArgTypes'
#0 154.6 make[2]: *** [CMakeFiles/lightgbm_objs.dir/build.make:734: CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_best_split_finder.cu.o] Error 1
#0 154.6 make[1]: *** [CMakeFiles/Makefile2:257: CMakeFiles/lightgbm_objs.dir/all] Error 2
#0 154.6 make: *** [Makefile:136: all] Error 2

Reproducible example

Environment info

LightGBM version or commit hash:

Command(s) you used to install LightGBM

mkdir /tmp/lib && cd /tmp/lib  \
    && git clone --recursive https://github.com/microsoft/LightGBM \
    && mkdir /tmp/lib/LightGBM/build && cd /tmp/lib/LightGBM/build \
    && cmake -DUSE_CUDA=1 .. && make -j \
    && pip uninstall -y lightgbm \
    && cd ../python-package/ && python setup.py install --precompile

Build in docker FROM rapidsai/rapidsai-core:23.02-cuda11.8-runtime-ubuntu22.04-py3.10 GCC 11.3

Additional Comments

shiyu1994 commented 1 year ago

@jmakov Is it possible to see more error message? For example, why the compilation of cuda_best_split_finder.cu fail?

jmakov commented 1 year ago

@shiyu1994 there seems to be only 1 type of error:

/tmp/lib/LightGBM/include/LightGBM/utils/../../../external_libs/fmt/include/fmt/format-inl.h(85): here                                             

/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with '...':                                                   
  435 |         function(_Functor&& __f)                                                                                                           
      |                                                                                                                                            
     ^                                                                                                                                             
/usr/include/c++/11/bits/std_function.h:435:145: note:         '_ArgTypes'                                                                         
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with '...':                                                   
  530 |         operator=(_Functor&& __f)                                                                                                          
      |                                                                                                                                            
      ^                                                                                                                                            
/usr/include/c++/11/bits/std_function.h:530:146: note:         '_ArgTypes'  

whole log: build_fail.txt

jmakov commented 1 year ago

This is kinda a blocker for me. Would be great to have some more insight into what can be done about it.

domtisdell commented 1 year ago

I've been having similar problems I think when trying to install v4.0. Builds were failing until I switched gcc (and g++ for good measure) to version 10 for compiling.

Found solution from this reference: https://github.com/NVIDIA/nccl/issues/650

jameslamb commented 6 months ago

Sorry for the long delay in response. I believe recent changes in LightGBM have fixed this.

I was able to build latest LightGBM (https://github.com/microsoft/LightGBM/commit/14435485bd7e9c4d72ab43bd269c33fc4230212f) in the latest stable rapidsai/base image.

(rapidsai/rapidsai-core images were removed as part of https://github.com/rapidsai/docker/issues/539)

docker run \
    --rm \
    --user root \
    -it rapidsai/base:24.04-cuda12.0-py3.10 \
    bash

mkdir /tmp/lib
cd /tmp/lib 

# install build tools (rapidsai/core doesn't ship these)
apt-get update
apt-get install -y \
    build-essential \
    cmake \
    git

# build LightGBM
git clone --recursive https://github.com/microsoft/LightGBM

cd ./LightGBM
cmake -B build -S . -DUSE_CUDA=1
cmake --build build --target _lightgbm -j2
sh build-python.sh install --precompile

That built successfully for me.

full logs (click me) Configure step: ```text -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- The CUDA compiler identification is NVIDIA 12.0.76 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /opt/conda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found CUDA: /opt/conda/targets/sbsa-linux (found suitable version "12.0", minimum required is "11.0") -- CMAKE_CUDA_FLAGS: -Xcompiler=-fopenmp -Xcompiler=-fPIC -Xcompiler=-Wall -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -O3 -lineinfo -- ALLFEATS_DEFINES: -DPOWER_FEATURE_WORKGROUPS=12;-DUSE_CONSTANT_BUF=0;-DENABLE_ALL_FEATURES -- FULLDATA_DEFINES: -DPOWER_FEATURE_WORKGROUPS=12;-DUSE_CONSTANT_BUF=0;-DENABLE_ALL_FEATURES;-DIGNORE_INDICES -- Performing Test MM_PREFETCH -- Performing Test MM_PREFETCH - Failed -- Performing Test MM_MALLOC -- Performing Test MM_MALLOC - Failed -- Configuring done -- Generating done -- Build files have been written to: /tmp/lib/LightGBM/build ``` Build step: ```text [ 1%] Building CUDA object CMakeFiles/histo_16_64_256_sp.dir/src/treelearner/kernels/histogram_16_64_256.cu.o [ 2%] Building CUDA object CMakeFiles/histo_16_64_256-fulldata_sp.dir/src/treelearner/kernels/histogram_16_64_256.cu.o [ 2%] Built target histo_16_64_256-fulldata_sp [ 2%] Built target histo_16_64_256_sp [ 4%] Building CXX object CMakeFiles/lightgbm_capi_objs.dir/src/c_api.cpp.o [ 5%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/boosting.cpp.o [ 6%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt.cpp.o [ 8%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt_model_text.cpp.o [ 8%] Built target lightgbm_capi_objs [ 9%] Building CUDA object CMakeFiles/histo_16_64_256_sp_const.dir/src/treelearner/kernels/histogram_16_64_256.cu.o [ 10%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt_prediction.cpp.o [ 12%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/prediction_early_stop.cpp.o [ 13%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/sample_strategy.cpp.o [ 13%] Built target histo_16_64_256_sp_const [ 14%] Building CUDA object CMakeFiles/histo_16_64_256-fulldata_sp_const.dir/src/treelearner/kernels/histogram_16_64_256.cu.o [ 16%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/bin.cpp.o [ 16%] Built target histo_16_64_256-fulldata_sp_const [ 17%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/config.cpp.o [ 18%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/config_auto.cpp.o [ 20%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/dataset.cpp.o [ 21%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/dataset_loader.cpp.o [ 22%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/file_io.cpp.o [ 24%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/json11.cpp.o [ 25%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/metadata.cpp.o [ 27%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/parser.cpp.o [ 28%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/train_share_states.cpp.o [ 29%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/tree.cpp.o [ 31%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/dcg_calculator.cpp.o [ 32%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/metric.cpp.o [ 33%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linker_topo.cpp.o [ 35%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linkers_mpi.cpp.o [ 36%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linkers_socket.cpp.o [ 37%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/network.cpp.o [ 39%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/objective_function.cpp.o [ 40%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/data_parallel_tree_learner.cpp.o [ 41%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/feature_histogram.cpp.o [ 43%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/feature_parallel_tree_learner.cpp.o [ 44%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/gpu_tree_learner.cpp.o [ 45%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/gradient_discretizer.cpp.o [ 47%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/linear_tree_learner.cpp.o In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:214, from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1, from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7: /tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h: In function 'Packet Eigen::internal::pload(const typename Eigen::internal::unpacket_traits::type*) [with Packet = Eigen::internal::eigen_packet_wrapper; typename Eigen::internal::unpacket_traits::type = signed char]': /tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h:1671:9: warning: 'void* memcpy(void*, const void*, size_t)' copying an object of non-trivial type 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper'} from an array of 'const int8_t' {aka 'const signed char'} [-Wclass-memaccess] 1671 | memcpy(&res, from, sizeof(Packet4c)); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:172, from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1, from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7: /tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/GenericPacketMath.h:159:8: note: 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper'} declared here 159 | struct eigen_packet_wrapper | ^~~~~~~~~~~~~~~~~~~~ In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:214, from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1, from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7: /tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h: In function 'Packet Eigen::internal::ploadu(const typename Eigen::internal::unpacket_traits::type*) [with Packet = Eigen::internal::eigen_packet_wrapper; typename Eigen::internal::unpacket_traits::type = signed char]': /tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h:1716:9: warning: 'void* memcpy(void*, const void*, size_t)' copying an object of non-trivial type 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper'} from an array of 'const int8_t' {aka 'const signed char'} [-Wclass-memaccess] 1716 | memcpy(&res, from, sizeof(Packet4c)); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:172, from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1, from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7: /tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/GenericPacketMath.h:159:8: note: 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper'} declared here 159 | struct eigen_packet_wrapper | ^~~~~~~~~~~~~~~~~~~~ [ 48%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/serial_tree_learner.cpp.o [ 50%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/tree_learner.cpp.o [ 51%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/voting_parallel_tree_learner.cpp.o [ 52%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/utils/openmp_wrapper.cpp.o [ 54%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cpp.o [ 55%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o [ 56%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/cuda/cuda_algorithms.cu.o [ 58%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/cuda/cuda_utils.cpp.o [ 59%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_column_data.cpp.o [ 60%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_column_data.cu.o [ 62%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_metadata.cpp.o [ 63%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_row_data.cpp.o [ 64%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_tree.cpp.o [ 66%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_tree.cu.o [ 67%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_binary_metric.cpp.o [ 68%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_pointwise_metric.cpp.o [ 70%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_pointwise_metric.cu.o [ 71%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_regression_metric.cpp.o [ 72%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_binary_objective.cpp.o [ 74%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_binary_objective.cu.o [ 75%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_multiclass_objective.cpp.o [ 77%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_multiclass_objective.cu.o [ 78%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_rank_objective.cpp.o [ 79%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_rank_objective.cu.o [ 81%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_regression_objective.cpp.o [ 82%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_regression_objective.cu.o [ 83%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_best_split_finder.cpp.o [ 85%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_best_split_finder.cu.o [ 86%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_data_partition.cpp.o [ 87%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_data_partition.cu.o [ 89%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_gradient_discretizer.cu.o [ 90%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_histogram_constructor.cpp.o [ 91%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_histogram_constructor.cu.o [ 93%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_leaf_splits.cpp.o [ 94%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_leaf_splits.cu.o [ 95%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_single_gpu_tree_learner.cpp.o [ 97%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_single_gpu_tree_learner.cu.o [ 97%] Built target lightgbm_objs [ 98%] Linking CUDA device code CMakeFiles/_lightgbm.dir/cmake_device_link.o [100%] Linking CXX shared library ../lib_lightgbm.so [100%] Built target _lightgb ``` Python build + install logs. ```text building lightgbm Collecting build>=0.10.0 Downloading build-1.2.1-py3-none-any.whl.metadata (4.3 kB) Requirement already satisfied: packaging>=19.1 in /opt/conda/lib/python3.10/site-packages (from build>=0.10.0) (24.0) Collecting pyproject_hooks (from build>=0.10.0) Downloading pyproject_hooks-1.0.0-py3-none-any.whl.metadata (1.3 kB) Collecting tomli>=1.1.0 (from build>=0.10.0) Downloading tomli-2.0.1-py3-none-any.whl.metadata (8.9 kB) Downloading build-1.2.1-py3-none-any.whl (21 kB) Downloading tomli-2.0.1-py3-none-any.whl (12 kB) Downloading pyproject_hooks-1.0.0-py3-none-any.whl (9.3 kB) Installing collected packages: tomli, pyproject_hooks, build Successfully installed build-1.2.1 pyproject_hooks-1.0.0 tomli-2.0.1 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv found pre-compiled lib_lightgbm.so --- building sdist --- * Creating isolated environment: venv+pip... * Installing packages in isolated environment: - setuptools * Getting build dependencies for sdist... running egg_info creating lightgbm.egg-info writing lightgbm.egg-info/PKG-INFO writing dependency_links to lightgbm.egg-info/dependency_links.txt writing requirements to lightgbm.egg-info/requires.txt writing top-level names to lightgbm.egg-info/top_level.txt writing manifest file 'lightgbm.egg-info/SOURCES.txt' reading manifest file 'lightgbm.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.dll' under directory 'lightgbm' warning: no files found matching '*.dylib' under directory 'lightgbm' adding license file 'LICENSE' writing manifest file 'lightgbm.egg-info/SOURCES.txt' * Building sdist... running sdist running egg_info writing lightgbm.egg-info/PKG-INFO writing dependency_links to lightgbm.egg-info/dependency_links.txt writing requirements to lightgbm.egg-info/requires.txt writing top-level names to lightgbm.egg-info/top_level.txt reading manifest file 'lightgbm.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.dll' under directory 'lightgbm' warning: no files found matching '*.dylib' under directory 'lightgbm' adding license file 'LICENSE' writing manifest file 'lightgbm.egg-info/SOURCES.txt' running check creating lightgbm-4.3.0.99 creating lightgbm-4.3.0.99/lightgbm creating lightgbm-4.3.0.99/lightgbm.egg-info creating lightgbm-4.3.0.99/lightgbm/lib copying files to lightgbm-4.3.0.99... copying LICENSE -> lightgbm-4.3.0.99 copying MANIFEST.in -> lightgbm-4.3.0.99 copying README.rst -> lightgbm-4.3.0.99 copying pyproject.toml -> lightgbm-4.3.0.99 copying setup.cfg -> lightgbm-4.3.0.99 copying lightgbm/__init__.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/basic.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/callback.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/compat.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/dask.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/engine.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/libpath.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/plotting.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm/py.typed -> lightgbm-4.3.0.99/lightgbm copying lightgbm/sklearn.py -> lightgbm-4.3.0.99/lightgbm copying lightgbm.egg-info/PKG-INFO -> lightgbm-4.3.0.99/lightgbm.egg-info copying lightgbm.egg-info/SOURCES.txt -> lightgbm-4.3.0.99/lightgbm.egg-info copying lightgbm.egg-info/dependency_links.txt -> lightgbm-4.3.0.99/lightgbm.egg-info copying lightgbm.egg-info/requires.txt -> lightgbm-4.3.0.99/lightgbm.egg-info copying lightgbm.egg-info/top_level.txt -> lightgbm-4.3.0.99/lightgbm.egg-info copying lightgbm/lib/lib_lightgbm.so -> lightgbm-4.3.0.99/lightgbm/lib copying lightgbm.egg-info/SOURCES.txt -> lightgbm-4.3.0.99/lightgbm.egg-info Writing lightgbm-4.3.0.99/setup.cfg Creating tar archive removing 'lightgbm-4.3.0.99' (and everything under it) Successfully built lightgbm-4.3.0.99.tar.gz --- installing lightgbm --- WARNING: Skipping lightgbm as it is not installed. WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Looking in links: . Processing ./lightgbm-4.3.0.99.tar.gz Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from lightgbm) (1.26.4) Requirement already satisfied: scipy in /opt/conda/lib/python3.10/site-packages (from lightgbm) (1.13.0) Building wheels for collected packages: lightgbm Building wheel for lightgbm (pyproject.toml) ... done Created wheel for lightgbm: filename=lightgbm-4.3.0.99-py3-none-any.whl size=62203670 sha256=ea5fe085de440887522cfa4a3b9f9ee1b076bc93be325cd1a3f068471d73bdf8 Stored in directory: /tmp/pip-ephem-wheel-cache-_be0h8ev/wheels/97/06/d4/842e2ab3fea42d639f11ba3250fbe19b540afb7108b58b2cfc Successfully built lightgbm Installing collected packages: lightgbm Successfully installed lightgbm-4.3.0.99 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv cleaning up ```
github-actions[bot] commented 5 months ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

fingoldo commented 3 months ago

Sorry for the long delay in response. I believe recent changes in LightGBM have fixed this.

I was able to build latest LightGBM (1443548) in the latest stable rapidsai/base image.

(rapidsai/rapidsai-core images were removed as part of rapidsai/docker#539)

docker run \
    --rm \
    --user root \
    -it rapidsai/base:24.04-cuda12.0-py3.10 \
    bash

mkdir /tmp/lib
cd /tmp/lib 

# install build tools (rapidsai/core doesn't ship these)
apt-get update
apt-get install -y \
    build-essential \
    cmake \
    git

# build LightGBM
git clone --recursive https://github.com/microsoft/LightGBM

cd ./LightGBM
cmake -B build -S . -DUSE_CUDA=1
cmake --build build --target _lightgbm -j2
sh build-python.sh install --precompile

That built successfully for me.

full logs (click me)

Wondering if it's possible to enforce architecture somehow. Trying to reproduce your commands on NVIDIA RTX 6000 Ada (SM 8.9) & CUDA Version: 12.4, Ubuntu 20.04.6 LTS leads to

$ "/usr/bin"/c++ -D__CUDA_ARCH__=300 -E -x c++

-DCUDA_DOUBLE_MATH_FUNCTIONS -DCUDACC -DNVCC -DCUDACC_VER_MAJOR=10 -DCUDACC_VER_MINOR=1 -DCUDACC_VER_BUILD=243 -include "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii"

$ cicc --c++14 --gnu_version=80400 --allow_managed -arch compute_30 -m64

-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library "/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name "CMakeCUDACompilerId.cu" --gen_c_file_name "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o "tmp/CMakeCUDACompilerId.ptx"

$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o

"tmp/CMakeCUDACompilerId.sm_30.cubin"

ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name'

fingoldo commented 3 months ago

Nevermind. I had to remove nvidia-cuda-toolkit (which I installed 'cause it allowed open CL version of lightgbm to work, only to find out it's buggy on big datasets and overall an abandoned branch).

Currently stuck at

found pre-compiled lib_lightgbm.so --- building sdist --- build-python.sh: 347: python: not found

Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command ;-)

jameslamb commented 3 months ago

build-python.sh: 347: python: not found

You have to have Python installed and a python executable available on PATH to build LightGBM's Python package.

I strongly suspect that you aren't using the exact example I provided in https://github.com/microsoft/LightGBM/issues/5785#issuecomment-2073988733, but you haven't described your setup here so it's not possible to help much more.

Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command

We're doing the best we can with a much smaller amount of maintainer availability. Those projects both have multiple maintainers being paid to work on them full-time... LightGBM does not.

You're welcome to come contribute here any time.

fingoldo commented 3 months ago

build-python.sh: 347: python: not found

You have to have Python installed and a python executable available on PATH to build LightGBM's Python package.

I strongly suspect that you aren't using the exact example I provided in #5785 (comment), but you haven't described your setup here so it's not possible to help much more.

Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command

We're doing the best we can with a much smaller amount of maintainer availability. Those projects both have multiple maintainers being paid to work on them full-time... LightGBM does not.

You're welcome to come contribute here any time.

Yeah, I know. Thanks a lot for your hard work, guys. I hope getting an easier access to GPU training is on the roadmap. Not experienced myself in that, otherwise would contribute for sure.