microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.55k stars 3.82k forks source link

[python-package] [gpu] Unable to Install LightGBM GPU Python Package on Windows #6325

Closed NisuSan closed 7 months ago

NisuSan commented 7 months ago

Issue Description: I encountered difficulties while attempting to install the LightGBM GPU (master branch) Python package on Windows. Despite successfully compiling the GPU version and obtaining the necessary .dll and .exe files in the Release folder, I faced several obstacles during the installation process using the command pip install ./python-package.

Steps to Reproduce:

Compile LightGBM GPU (master branch) version on Windows. Check Release folder for containing the .dll and .exe files. Execute the command pip install ./python-package from root folder (LightGBM)

Expected Behavior: The Python package installation process should proceed smoothly without any errors.

Actual Behavior: Encountered errors during the installation process:

Initially failed to locate the LICENSE file within the 'python-package' folder. Manually creating the LICENSE file resolved this issue. Subsequently, encountered an error indicating the absence of the 'CMakeLists.txt' file.

Additional Information:

OS: Windows Compiler: cmake Python version: 3.11.5 LightGBM version: folder cloned from master branch usning git clone --recursive https://github.com/microsoft/LightGBM

Proposed Solution: Investigate and resolve the issues preventing successful installation of the LightGBM GPU Python package on Windows. This may involve ensuring all required files are present and addressing any potential compatibility issues.

Thank you for your attention to this matter. If further information or logs are required, please let me know.

jameslamb commented 7 months ago

Thanks for using LightGBM and for the thorough write-up!

As explained in the documentation, you cannot simple pip install ./python-package in this repo. Building the Python package from source is driven by a shell script.

To build the GPU Python package from GitHub sources, do the following:

git clone --recursive  https://github.com/microsoft/LightGBM
cd ./LightGBM
sh build-python.sh install --gpu

But only do that if you need an unreleased version of lightgbm. If you are ok using a released version, install from PyPI... the Windows wheels we distributed have the OpenCL-based GPU support (not CUDA) already compiled in.

pip install lightgbm

For more details, see https://stackoverflow.com/a/77078844/3986677

NisuSan commented 7 months ago

I tried to run pip install \ --force-reinstall \ --no-binary lightgbm \ --config-settings=cmake.define.USE_CUDA=ON \ lightgbm according to https://stackoverflow.com/a/77078844/3986677 and got error "ERROR: Failed building wheel for lightgbm. ERROR: Could not build wheels for lightgbm, which is required to install pyproject.toml-based projects".

After that I tried to use simplified version of command and just run the pip install lightgbm --config-settings=cmake.define.USE_CUDA=ON and packege installed well, but when I tried to set { 'device_type': 'cuda' } in my script, I got error: "Trial 0 failed with parameters: {'feature_fraction': 0.6} because of the following error: LightGBMError('CUDA Tree Learner was not enabled in this build.\nPlease recompile with CMake option -DUSE_CUDA=1')"

UPD I tried to install package from local repo using sh build-python.sh install --gpu and it works, but only with { 'device_type': 'gpu' }, not "cuda". What exactly difference between this two options?

UPD 2 I tried sh build-python.sh install --cuda too and its failed with "CMake build failed ERROR Backend subprocess exited when trying to invoke build_wheel"

jameslamb commented 7 months ago

got error "ERROR: Failed building wheel for lightgbm. ERROR: Could not build wheels for lightgbm, which is required to install pyproject.toml-based projects".

That error has many possible causes. I strongly suspect that there were more logs than just that printed, which might help us to help you identify the root cause.

Can you please run this again:

pip install \
    --force-reinstall \
    --no-binary lightgbm \
    --config-settings=cmake.define.USE_CUDA=ON \
    lightgbm

And share the full output that's printed?


What exactly difference between this two options?

See https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version for more information.


In case you're new to GitHub... please see https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax for some tips on how to format text here in a way that makes the difference between code, output from code, and your own words clearer.

jameslamb commented 7 months ago

More information: https://github.com/microsoft/LightGBM/issues/6281#issuecomment-1903252918

NisuSan commented 7 months ago

And share the full output that's printed?

Sure, lightgbm.log

NisuSan commented 7 months ago

@jameslamb Ok, now it's real interesting, because I tried to use docker image from here and got the error too! I create the preproduction repo and describe the steps I did. Hope this stuff helps you understand why the problem appears.

jameslamb commented 7 months ago

lightgbm.log

Thank you.

I see compilation errors like this:

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu -rdc=true  -I"C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\external_libs\eigen" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -I"C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include"     --keep-dir lightgbm_objs\x64\Release  -maxrregcount=0   --machine 64 --compile -cudart static -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -O3 -lineinfo -Xcompiler="/EHsc -openmp -fPIC -Ob2"   -D_WINDOWS -DNDEBUG -DEIGEN_MPL2_ONLY -DEIGEN_DONT_PARALLELIZE -DUSE_SOCKET -DUSE_CUDA -DWIN_HAS_INET_PTON -D"CMAKE_INTDIR=\"Release\"" -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DEIGEN_MPL2_ONLY -DEIGEN_DONT_PARALLELIZE -DUSE_SOCKET -DUSE_CUDA -DWIN_HAS_INET_PTON -D"CMAKE_INTDIR=\"Release\"" -Xcompiler "/EHsc /Wall /nologo /O2 /FS   /MD /GR" -Xcompiler "/Fdlightgbm_objs.dir\Release\lightgbm_objs.pdb" -o lightgbm_objs.dir\Release\/src/treelearner/cuda/cuda_best_split_finder.cu.obj "C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\src\treelearner\cuda\cuda_best_split_finder.cu"

cl : Command line warning D9002: ignoring unknown option '-fPIC' [C:\Users\Antony\AppData\Local\Temp\tmpnc1sf10k\build\lightgbm_objs.vcxproj]

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\crt/host_config.h(104): warning C4668: '__NV_NO_HOST_COMPILER_CHECK' is not defined as a preprocessor macro, replacing with '0' for '#if/#elif' 

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include\cuda.h(3180): warning C4668: '__STDC_VERSION__' is not defined as a preprocessor macro, replacing with '0' for '#if/#elif' 

C:/Users/Antony/AppData/Local/Temp/pip-install-1rpnm3ee/lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2/include\LightGBM/utils/common.h(33): warning C4464: relative include path contains '..' [C:\Users\Antony\AppData\Local\Temp\tmpnc1sf10k\build\lightgbm_objs.vcxproj]
C:/Users/Antony/AppData/Local/Temp/pip-install-1rpnm3ee/lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2/include\LightGBM/utils/common.h(34): warning C4464: relative include path contains '..' [C:\Users\Antony\AppData\Local\Temp\tmpnc1sf10k\build\lightgbm_objs.vcxproj]

cl : Command line warning D9002: ignoring unknown option '-fPIC' 

...

C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\src\treelearner\cuda\cuda_best_split_finder.cu(1937): error : identifier "LightGBM::kMinScore" is undefined in device code 

C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\src\treelearner\cuda\cuda_best_split_finder.cu(1966): error : identifier "LightGBM::kMinScore" is undefined in device code 

... dozens more like that ...

Error limit reached.
100 errors detected in the compilation of "C:/Users/Antony/AppData/Local/Temp/pip-install-1rpnm3ee/lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2/src/treelearner/cuda/cuda_best_split_finder.cu".
Compilation terminated.

So just to confirm... that output came from running precisely this command, with no other customizations?

pip install \
    --force-reinstall \
    --no-binary lightgbm \
    --config-settings=cmake.define.USE_CUDA=ON \
    lightgbm

the preproduction repo and describe the steps I did

The error message you're reporting there is this:

"[LightGBM] [Fatal] CUDA Tree Learner was not enabled in this build. Please recompile with CMake option -DUSE_CUDA=1"

And you did not compile the library with -DUSE_CUDA=1.

cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ .. 

If you want to use {"device": "cuda"}, you have to compile the library with -DUSE_CUDA=1, exactly as that message says.

NisuSan commented 7 months ago

So just to confirm... that output came from running precisely this command, with no other customizations?

Yes, no customizations.

And you did not compile the library with -DUSE_CUDA=1

Oh, I see now.

NisuSan commented 7 months ago

@jameslamb , Finally I compiled the module for CUDA using, but now I got error

LightGBMError: Check failed: (split_indices_block_size_data_partition) > (0) at /usr/local/src/lightgbm/LightGBM/lightgbm-python/src/treelearner/cuda/cuda_data_partition.cpp, line 280 .

I don't google any information about this error..

jameslamb commented 7 months ago

What specific command(s) did you run or other actions did you take to fix the compilation errors?

NisuSan commented 7 months ago

What specific command(s) did you run or other actions did you take to fix the compilation errors?

I did it for Docker, not Windows.

  1. Replace -DUSE_GPU=1 by -DUSE_CUDA=1
  2. Replace ./build-python.sh install --precompile by ./build-python.sh install --cuda
jameslamb commented 7 months ago

Ok. Well it looks like you've opened another issue for the new error message you're reporting (#6329), and the documentation here does explicitly say that Windows support for the CUDA interface is not currently available:

Note: only Linux is supported, other operating systems are not supported yet.

ref: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version

So as it seems you're not interested in continuing to help with identifying the root cause of these issues on Windows, we'll close this.