microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.55k stars 3.82k forks source link

opencl version was incorrectly detected. #902

Closed cjliux closed 6 years ago

cjliux commented 7 years ago

I followed the installation guide in wiki to build the gpu version of lightGBM. However, the version detected when I entered sudo cmake -USE_MPI=ON -USE-GPU=1 .. is 2.0, while the version actually installed is 1.0. Do any one know the reason?

huanzhang12 commented 7 years ago

The OpenCL 2.0 library is probably a generic ICD (installable client driver) library and headers provided by the OS, which is automatically detected by CMake. At runtime, the ICD library dispatches actual OpenCL function calls to the vendor specific OpenCL libraries provided by AMD/Nvidia/Intel.

In LightGBM, only OpenCL 1.2 is required, and OpenCL 2.0 should be backwards compatible. In fact, all NVIDIA GPUs support 1.2 only; most AMD GPUs after 2014 theoretically support 2.0, but getting it working on Linux can be tricky. If LightGBM compiles and works fine for you, you don't need to worry about it.

cjliux commented 7 years ago

Actually, OpenCL2.0 isn't present on my machine, and cmake raised the error because it cannot find the correct version identifier in the libopencl.so file since it had recognised it to be of version 2.0 rather than 1.2.

huanzhang12 commented 7 years ago

@cjliux Can you provide more information, including your GPU vendor and CMake log?

cjliux commented 7 years ago

In [2]: import lightgbm

OSError Traceback (most recent call last)

in () ----> 1 import lightgbm /opt/LightGBM/python-package/lightgbm/__init__.py in () 6 from __future__ import absolute_import 7 ----> 8 from .basic import Booster, Dataset 9 from .callback import (early_stopping, print_evaluation, record_evaluation, 10 reset_parameter) /opt/LightGBM/python-package/lightgbm/basic.py in () 30 31 ---> 32 _LIB = _load_lib() 33 34 /opt/LightGBM/python-package/lightgbm/basic.py in _load_lib() 25 if len(lib_path) == 0: 26 return None ---> 27 lib = ctypes.cdll.LoadLibrary(lib_path[0]) 28 lib.LGBM_GetLastError.restype = ctypes.c_char_p 29 return lib /usr/lib/python2.7/ctypes/__init__.pyc in LoadLibrary(self, name) 438 439 def LoadLibrary(self, name): --> 440 return self._dlltype(name) 441 442 cdll = LibraryLoader(CDLL) /usr/lib/python2.7/ctypes/__init__.pyc in __init__(self, name, mode, handle, use_errno, use_last_error) 360 361 if handle is None: --> 362 self._handle = _dlopen(self._name, mode) 363 else: 364 self._handle = handle OSError: /opt/LightGBM/python-package/lightgbm/../../lib_lightgbm.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference
luozm commented 6 years ago

I have the same problem. Everything is ok during installation (both command line and python version), and I can use both CPU and GPU on the command line. But when I try to import it in python, it raises an error: >>> import lightgbm

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/dist-packages/lightgbm/__init__.py", line 8, in <module> from .basic import Booster, Dataset File "/usr/local/lib/python3.5/dist-packages/lightgbm/basic.py", line 32, in <module> _LIB = _load_lib() File "/usr/local/lib/python3.5/dist-packages/lightgbm/basic.py", line 27, in _load_lib lib = ctypes.cdll.LoadLibrary(lib_path[0]) File "/usr/lib/python3.5/ctypes/__init__.py", line 425, in LoadLibrary return self._dlltype(name) File "/usr/lib/python3.5/ctypes/__init__.py", line 347, in __init__ self._handle = _dlopen(self._name, mode) OSError: /usr/local/lib/python3.5/dist-packages/lightgbm/lib_lightgbm.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

System Info:

I think it's due to CUDA, maybe it links to a wrong version of OpenCL? And I didn't add these variables below because I don't know which path should I add. How can I fix this?

You need to add OpenCL_INCLUDE_DIR to the environmental variable 'PATH' and export BOOST_ROOT before installation.

Thx a lot! Look forward to your reply. @huanzhang12 @wxchan @henry0312 @StrikerRUS

Tony-Y commented 6 years ago

@luozm When I installed the python package, I modified setup.py as followings:

cmake_cmd = ["cmake", "../compile/"]
    if use_gpu:
        cmake_cmd.append("-DUSE_GPU=ON")
        cmake_cmd.append("-DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so")

I also set the environment variable CUDA_ROOT as the following:

export CUDA_ROOT=/usr/local/cuda

After that, ran python setup.py install --gpu.

luozm commented 6 years ago

@Tony-Y Really thanks for your help!

Tony-Y commented 6 years ago

@luozm If you use master branch of CMake, you need not modify setup.py. You can find the commit message about the fix at https://github.com/Kitware/CMake/commit/b361990007a4d40f5a0f682455bcea89efd7eecc

Weekend-Warrior commented 5 years ago

Hey guys, I'm having this exact problem installing the R package at the moment. I'm using Laurae's lgbdl package to attempt to build the gpu version. It is able to build the package using:

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.0") 
-- OpenCL include directory:/usr/include
-- Boost version: 1.58.0
-- Found the following Boost libraries:
--   filesystem
--   system
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/Rtmp3Qfite/LightGBM/lightgbm_r/src/build

but then reports:

symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

And then fails. My system is:

MRAN 3.5.1 on Ubuntu 16.04

Help is much appreciated.