topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 633 forks source link

Include LightGBM into caret #572

Closed tobigithub closed 7 years ago

tobigithub commented 7 years ago

LightGBM is a new boosting framework, that is (really) extremely fast, scales to many cores and provides highly accurate predictions for classification and regression type problems. https://github.com/Microsoft/LightGBM/tree/master/R-package

Here are some benchmarks using millions of data points and excellent parallel scaling and memory use. I guess it also outperforms 90% of the other ML packages, because many of them will not even be able to handle so much data or actually finish within a few minutes. https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment

This is a very exciting package and probably the best thing that happened since xgboost. This is of course the early beta version for R but given the good performance it might be worthwhile.

topepo commented 7 years ago

I was going to kick it around the other day but couldn't get installed on my mac do to openmpi header issues. I'll take a look next week when I'm back in town.

ajing commented 7 years ago

Just wondering is there any progress on adding lightGBM?

Manu4l commented 7 years ago

Any updates?

topepo commented 7 years ago

Not yet. openmp is a pain on OS X and my linux box will at 100% utilization for the next week or so.

I'm interested in it so it will happen.

Manu4l commented 7 years ago

Cool. Looking forward to the release then.

tobigithub commented 7 years ago

The simple lightGBM installation example runs fine under Mac OSX El Capitan (10.11) it is parallelized and uses all CPUs on a given socket (like multi CPU workstations or NUMA nodes) I think that should cover 90% or more of all caret users.

brew install cmake
brew install gcc --without-multilib
git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM
mkdir build ; cd build
cmake -DCMAKE_CXX_COMPILER=g++-6 -DCMAKE_C_COMPILER=gcc-6 .. 
make -j 

The OpenMPI installation also compiles fine no errors and examples run fine out of the box with Mac OSX El Capitan (10.11), but maybe its not needed if it creates issues during installation or runtime. Most of the users will not use OpenMPI (I guess). So maybe just ignore OpenMPI for the time beeing.

brew install openmpi 
brew install cmake
brew install gcc --without-multilib
git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM
mkdir build ; cd build
cmake -DCMAKE_CXX_COMPILER=g++-6 -DCMAKE_C_COMPILER=gcc-6 -DUSE_MPI=ON .. 
make -j 

This is under a fairly clean system, no other complicated systems or libraries installed, so maybe running a clean install or VM may solve the issues.

Lan131 commented 7 years ago

Any update on this?

topepo commented 7 years ago

I'd previously installed gcc using --without-multilib

$ /usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6 --version
gcc-6 (Homebrew GCC 6.3.0_1 --without-multilib) 6.3.0

and used this:

cmake -DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6 .. 

and get

-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:8 (PROJECT):
  The CMAKE_C_COMPILER:

    gcc-6

  is not a full path and was not found in the PATH.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.

CMake Error at CMakeLists.txt:8 (PROJECT):
  The CMAKE_CXX_COMPILER:

    g++-6

  is not a full path and was not found in the PATH.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.

-- Configuring incomplete, errors occurred!

I edited the file and used

if(APPLE)
    SET(CXX "/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6")
    SET(CC "/usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6")
endif()

and built lightgbm.

The current issue is

$ cd ..
$ R CMD INSTALL R-package/
* installing to library ‘/Users/max/Library/R/3.3/library’
* installing *source* package ‘lightgbm’ ...
** libs
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -std=c++11 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I../..//include -DUSE_SOCKET -Wno-deprecated-declarations -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include   -fopenmp  -std=c++11 -fPIC  -Wall -mtune=core2 -g -O2 -c lightgbm-all.cpp -o lightgbm-all.o
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -std=c++11 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I../..//include -DUSE_SOCKET -Wno-deprecated-declarations -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include   -fopenmp  -std=c++11 -fPIC  -Wall -mtune=core2 -g -O2 -c lightgbm_R.cpp -o lightgbm_R.o
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -std=c++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o lightgbm.so ./lightgbm-all.o ./lightgbm_R.o -fopenmp -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Users/max/Library/R/3.3/library/lightgbm/libs
** R
** data
** demo
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/Users/max/Library/R/3.3/library/lightgbm/libs/lightgbm.so':
  dlopen(/Users/max/Library/R/3.3/library/lightgbm/libs/lightgbm.so, 6): Symbol not found: _GOMP_parallel
  Referenced from: /Users/max/Library/R/3.3/library/lightgbm/libs/lightgbm.so
  Expected in: flat namespace
 in /Users/max/Library/R/3.3/library/lightgbm/libs/lightgbm.so
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/Users/max/Library/R/3.3/library/lightgbm’

@tobigithub Any suggestions?

tobigithub commented 7 years ago

@topepo

using your example on OSX (el capt) i get:

osxs-iMac:LightGBM osx$ cd build/
osxs-iMac:build osx$ cmake -DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -DCMAKE_C_COMPILER=/usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6 ..
/usr/local/Cellar/open-mpi/2.1.0/lib/libmpi.dylib
/usr/local/Cellar/open-mpi/2.1.0/lib/libmpi.dylib
-- Configuring done
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_C_COMPILER= /usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6
CMAKE_CXX_COMPILER= /usr/local/Cellar/gcc/6.3.0_1/bin/g++-6

-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-6
-- Check for working C compiler: /usr/local/bin/gcc-6 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-6
-- Check for working CXX compiler: /usr/local/bin/g++-6 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Configuring done
CMake Warning (dev):
  Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
  --help-policy CMP0042" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  MACOSX_RPATH is not specified for the following targets:

   _lightgbm

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/osx/lightgbm/LightGBM/build
osxs-iMac:build osx$ ls
CMakeCache.txt      CMakeFiles      Makefile        cmake_install.cmake

then the make command: make -j

osxs-iMac:LightGBM osx$ pwd
/Users/osx/lightgbm/LightGBM
osxs-iMac:LightGBM osx$ cd build/
osxs-iMac:build osx$ make -j
Scanning dependencies of target _lightgbm
Scanning dependencies of target lightgbm
[  2%] Building CXX object CMakeFiles/lightgbm.dir/src/main.cpp.o
[  4%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt.cpp.o
[  6%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/bin.cpp.o
[ 10%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/boosting.cpp.o
[ 12%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
[ 14%] Building CXX object CMakeFiles/_lightgbm.dir/src/application/application.cpp.o
[ 16%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt.cpp.o
[ 18%] Building CXX object CMakeFiles/_lightgbm.dir/src/c_api.cpp.o
[ 20%] Building CXX object CMakeFiles/lightgbm.dir/src/io/bin.cpp.o
[ 22%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_socket.cpp.o
[ 24%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linker_topo.cpp.o
[ 26%] Building CXX object CMakeFiles/lightgbm.dir/src/objective/objective_function.cpp.o
[ 28%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/dcg_calculator.cpp.o
[ 32%] Building CXX object CMakeFiles/lightgbm.dir/src/io/metadata.cpp.o
[ 34%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 34%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/metric.cpp.o
[ 36%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/metadata.cpp.o
[ 42%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/parser.cpp.o
[ 44%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset_loader.cpp.o
[ 46%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_mpi.cpp.o
[ 50%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/boosting.cpp.o
[ 58%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset.cpp.o
[ 38%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset_loader.cpp.o
[ 48%] Building CXX object CMakeFiles/lightgbm.dir/src/io/config.cpp.o
[ 52%] Building CXX object CMakeFiles/lightgbm.dir/src/io/tree.cpp.o
[ 56%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset.cpp.o
[ 56%] Building CXX object CMakeFiles/lightgbm.dir/src/io/parser.cpp.o
[ 60%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/tree.cpp.o
[ 62%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 64%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_socket.cpp.o
[ 40%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/config.cpp.o
[ 66%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/dcg_calculator.cpp.o
[ 68%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 68%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/metric.cpp.o
[ 70%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 72%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_mpi.cpp.o
[ 74%] Building CXX object CMakeFiles/lightgbm.dir/src/network/network.cpp.o
[ 76%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/network.cpp.o
[ 78%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 80%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/tree_learner.cpp.o
[ 82%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
[ 84%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linker_topo.cpp.o
[ 86%] Building CXX object CMakeFiles/_lightgbm.[ 90%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/tree_learner.cpp.o
dir/src/objective/objective_function.cpp.o
[ 92%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 92%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 94%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 96%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
[ 98%] Linking CXX executable ../lightgbm
[100%] Linking CXX shared library ../lib_lightgbm.so
[100%] Built target lightgbm
[100%] Built target _lightgbm
osxs-iMac:build osx$ 
osxs-iMac:build osx$ 
osxs-iMac:build osx$ ls
CMakeCache.txt      CMakeFiles      Makefile        cmake_install.cmake
osxs-iMac:build osx$ cd ..
osxs-iMac:LightGBM osx$  ls -l
total 5040
-rw-r--r--   1 osx  staff     3964 Apr 12 00:55 CMakeLists.txt
-rw-r--r--   1 osx  staff     1085 Apr 12 00:55 LICENSE
drwxr-xr-x  14 osx  staff      476 Apr 12 00:55 R-package
-rw-r--r--   1 osx  staff     4114 Apr 12 00:55 README.md
drwxr-xr-x   6 osx  staff      204 Apr 13 00:52 build
drwxr-xr-x  19 osx  staff      646 Apr 12 00:55 compute
drwxr-xr-x   4 osx  staff      136 Apr 12 00:55 docker
drwxr-xr-x  14 osx  staff      476 Apr 12 00:55 docs
drwxr-xr-x  11 osx  staff      374 Apr 12 22:30 examples
drwxr-xr-x   3 osx  staff      102 Apr 12 00:55 include
-rwxr-xr-x   1 osx  staff  1324260 Apr 13 01:03 lib_lightgbm.so
-rwxr-xr-x   1 osx  staff  1234144 Apr 13 01:03 lightgbm
drwxr-xr-x   4 osx  staff      136 Apr 12 00:55 pmml
drwxr-xr-x   5 osx  staff      170 Apr 12 00:55 python-package
drwxr-xr-x  11 osx  staff      374 Apr 12 00:55 src
drwxr-xr-x   4 osx  staff      136 Apr 12 00:55 tests
drwxr-xr-x   5 osx  staff      170 Apr 12 00:55 windows
osxs-iMac:LightGBM osx$

basically the executable and shared object file (.so) is generated without problem. However before that I executed all the default install commands from above and just deleted the executable and .so for proof of principle.

Then cd examples/regression ./lightgbm config=train.conf and the run is finished in 4 seconds or so. The install R using homebrew

brew tap homebrew/science
brew install r

then start R and install all four required required packages

R
install.packages(c("R6","data.table", "magrittr","jsonlite"))

then install the lightGBM R package

osxs-iMac:LightGBM osx$ R CMD INSTALL R-package/
* installing to library ‘/usr/local/lib/R/3.3/site-library’
* installing *source* package ‘lightgbm’ ...
** libs
clang++ -std=c++11 -I/usr/local/Cellar/r/3.3.3_1/R.framework/Resources/include -DNDEBUG -I../..//include -DUSE_SOCKET -Wno-deprecated-declarations -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/include     -std=c++11 -fPIC  -g -O2 -c lightgbm-all.cpp -o lightgbm-all.o
clang++ -std=c++11 -I/usr/local/Cellar/r/3.3.3_1/R.framework/Resources/include -DNDEBUG -I../..//include -DUSE_SOCKET -Wno-deprecated-declarations -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/include     -std=c++11 -fPIC  -g -O2 -c lightgbm_R.cpp -o lightgbm_R.o
clang++ -std=c++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/opt/gettext/lib -L/usr/local/opt/readline/lib -L/usr/local/lib -L/usr/local/Cellar/r/3.3.3_1/R.framework/Resources/lib -L/usr/local/opt/gettext/lib -L/usr/local/opt/readline/lib -L/usr/local/lib -o lightgbm.so ./lightgbm-all.o ./lightgbm_R.o -F/usr/local/Cellar/r/3.3.3_1/R.framework/.. -framework R -lintl -Wl,-framework -Wl,CoreFoundation
installing to /usr/local/lib/R/3.3/site-library/lightgbm/libs
** R
** data
** demo
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (lightgbm)
osxs-iMac:LightGBM osx$ osxs-iMac:Li

then double check lightgbm can be loaded in R, start an R session and: it loads fine. I would suggest a VM or container to have a clean install of ac OSX El Capitan (10.11) that should solve the issues.

> require(lightgbm)
Loading required package: lightgbm
Loading required package: R6
>
topepo commented 7 years ago

I get there fine (see below). Installing the R package from the R-package directory is what fails with Symbol not found: _GOMP_parallel

$ cmake ..
-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6
-- Check for working C compiler: /usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/Cellar/gcc/6.3.0_1/bin/g++-6
-- Check for working CXX compiler: /usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
CMake Warning (dev) at /usr/local/Cellar/cmake/3.8.0/share/cmake/Modules/CMakeDetermineCompilerABI.cmake:78 (if):
  Policy CMP0054 is not set: Only interpret if() arguments as variables or
  keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.

  Quoted variables like "CXX" will no longer be dereferenced when the policy
  is set to NEW.  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.8.0/share/cmake/Modules/CMakeTestCXXCompiler.cmake:58 (CMAKE_DETERMINE_COMPILER_ABI)
  CMakeLists.txt:8 (PROJECT)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found MPI_C: /usr/local/Cellar/open-mpi/2.0.1/lib/libmpi.dylib  
-- Found MPI_CXX: /usr/local/Cellar/open-mpi/2.0.1/lib/libmpi.dylib  
/usr/local/Cellar/open-mpi/2.0.1/lib/libmpi.dylib
/usr/local/Cellar/open-mpi/2.0.1/lib/libmpi.dylib
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
CMake Warning (dev) at /usr/local/Cellar/cmake/3.8.0/share/cmake/Modules/FindOpenMP.cmake:163 (if):
  Policy CMP0054 is not set: Only interpret if() arguments as variables or
  keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.

  Quoted variables like "CXX" will no longer be dereferenced when the policy
  is set to NEW.  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.8.0/share/cmake/Modules/FindOpenMP.cmake:266 (_OPENMP_GET_SPEC_DATE)
  CMakeLists.txt:28 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at /usr/local/Cellar/cmake/3.8.0/share/cmake/Modules/FindOpenMP.cmake:166 (elseif):
  Policy CMP0054 is not set: Only interpret if() arguments as variables or
  keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.

  Quoted variables like "CXX" will no longer be dereferenced when the policy
  is set to NEW.  Since the policy is not set the OLD behavior will be used.
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.8.0/share/cmake/Modules/FindOpenMP.cmake:266 (_OPENMP_GET_SPEC_DATE)
  CMakeLists.txt:28 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found OpenMP: -fopenmp  
-- Configuring done
CMake Warning (dev):
  Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
  --help-policy CMP0042" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  MACOSX_RPATH is not specified for the following targets:

   _lightgbm

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/max/tmp/LightGBM/build
$ make -j
Scanning dependencies of target lightgbm
Scanning dependencies of target _lightgbm
[  6%] Building CXX object CMakeFiles/_lightgbm.dir/src/application/application.cpp.o
[ 10%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/boosting.cpp.o
[ 10%] Building CXX object CMakeFiles/lightgbm.dir/src/main.cpp.o
[  2%] Building CXX object CMakeFiles/_lightgbm.dir/src/c_api.cpp.o
[ 10%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
[ 12%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt.cpp.o
[ 14%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/boosting.cpp.o
[ 16%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt.cpp.o
[ 18%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/bin.cpp.o
[ 20%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/config.cpp.o
[ 24%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset.cpp.o
[ 24%] Building CXX object CMakeFiles/lightgbm.dir/src/io/bin.cpp.o
[ 26%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/dataset_loader.cpp.o
[ 32%] Building CXX object CMakeFiles/lightgbm.dir/src/io/config.cpp.o
[ 30%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset.cpp.o
[ 32%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/metadata.cpp.o
[ 34%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/parser.cpp.o
[ 36%] Building CXX object CMakeFiles/lightgbm.dir/src/io/metadata.cpp.o
[ 38%] Building CXX object CMakeFiles/lightgbm.dir/src/io/parser.cpp.o
[ 40%] Building CXX object CMakeFiles/lightgbm.dir/src/io/tree.cpp.o
[ 42%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/dcg_calculator.cpp.o
[ 44%] Building CXX object CMakeFiles/_lightgbm.dir/src/io/tree.cpp.o
[ 46%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/dcg_calculator.cpp.o
[ 48%] Building CXX object CMakeFiles/_lightgbm.dir/src/metric/metric.cpp.o
[ 50%] Building CXX object CMakeFiles/lightgbm.dir/src/metric/metric.cpp.o
[ 52%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/network.cpp.o
[ 54%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_socket.cpp.o
[ 56%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linker_topo.cpp.o
[ 58%] Building CXX object CMakeFiles/_lightgbm.dir/src/network/linkers_mpi.cpp.o
[ 60%] Building CXX object CMakeFiles/_lightgbm.dir/src/objective/objective_function.cpp.o
[ 64%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linker_topo.cpp.o
[ 66%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_mpi.cpp.o
[ 66%] Building CXX object CMakeFiles/lightgbm.dir/src/objective/objective_function.cpp.o
[ 70%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 70%] Building CXX object CMakeFiles/lightgbm.dir/src/network/network.cpp.o
[ 74%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 76%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 76%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 78%] Building CXX object CMakeFiles/lightgbm.dir/src/network/linkers_socket.cpp.o
[ 80%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
[ 82%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/serial_tree_learner.cpp.o
[ 84%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/tree_learner.cpp.o
[ 86%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/tree_learner.cpp.o
[ 88%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 90%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 92%] Building CXX object CMakeFiles/lightgbm.dir/src/io/dataset_loader.cpp.o
[ 94%] Building CXX object CMakeFiles/lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 96%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 98%] Linking CXX executable ../lightgbm
[100%] Linking CXX shared library ../lib_lightgbm.so
[100%] Built target _lightgbm
[100%] Built target lightgbm
tobigithub commented 7 years ago

@topepo I see, there are/were problems with older OSX versions and older Xtools and compiler flags. https://github.com/search?q=_GOMP_parallel&type=Issues&utf8=%E2%9C%93

1) One recommendation is to include -fopenmp into compiler and linker flags https://github.com/szaghi/FoBiS/issues/48#issuecomment-81504745

2) And another one suggested to set compiler flags for SHLIB_OPENMP https://github.com/Microsoft/LightGBM/issues/196#issuecomment-272348186

3) Another one was to update homebrew, openMP and install the latest XCODE version https://github.com/Rdatatable/data.table/issues/1692

It seems to be a mix of issues due to older versions, observed by a number of people. I would still opt to not include OPENMP by default (unless for cluster use), but rather use the native parallelization from the native lightGMB code. I get 100% utilization of all 32 threads under Windows, which is excellent, no memory bottleneck, no disk bottleneck. This kind of utilization is only observed under well optimized code, like Intel BLAS or maybe benchmarks.

I can not test which version is slower, MPI probably due to communications overhead, also there seem to be some optimization issues under OSX see here (slower speed): https://github.com/Microsoft/LightGBM/issues/89

For other issues I can only recommend to open and ticket on the lightGBM repo, they really try to solve bugs and help with all issues. https://github.com/Microsoft/LightGBM/issues

topepo commented 7 years ago

I changed the first four lines of ~/.R/Makevars to

CC=/usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6 -fopenmp
CXX=/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -fopenmp
CXX1X=/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -fopenmp
SHLIB_CXXLD=/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -fopenmp
FC=/usr/local/bin/gfortran-4.8
F77=/usr/local/bin/gfortran-4.8
MAKE=make -j 7

SHLIB_OPENMP_CFLAGS=-fopenmp
SHLIB_OPENMP_CXXFLAGS=-fopenmp
SHLIB_OPENMP_FCFLAGS=-fopenmp
SHLIB_OPENMP_FFLAGS=-fopenmp

LDFLAGS=-L/usr/local/opt/llvm/lib
CPPFLAGS=-I/usr/local/opt/llvm/include

and it did compile but a test resulted in

> library(lightgbm)
Loading required package: R6
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lightgbm_0.1 R6_2.2.0    

loaded via a namespace (and not attached):
[1] magrittr_1.5      data.table_1.10.4
>      library(lightgbm)
>      data(agaricus.train, package = "lightgbm")
>      train <- agaricus.train
>      dtrain <- lgb.Dataset(train$data, label = train$label)
>      params <- list(objective = "regression", metric = "l2")
>      model <- lgb.cv(params,
+                      dtrain,
+                      10,
+                      nfold = 5,
+                      min_data = 1,
+                      learning_rate = 1,
+                      early_stopping_rounds = 10)
Loading required package: Matrix
R(70152,0x70000b4b9000) malloc: *** error for object 0x7fa5d0d30a40: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
R(70152,0x7fffcf97e3c0) malloc: *** error for object 0x7fa5d3b00268: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
R(70152,0x70000b5bf000) malloc: *** error for object 0x7fa5d0c52ce0: double free
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

I will submit an issue to their repo. Thanks for the help

topepo commented 7 years ago

It looks like using gcc 5 was the answer:

$ /usr/local/Cellar/gcc@5/5.4.0_1/bin/gcc-5 --version
gcc-5 (Homebrew GCC 5.4.0_1 --without-multilib) 5.4.0

I'll start creating a method for train in the next week (OOO for a few days).

nnormandin commented 7 years ago

Looking forward to this, as there currently appears to be no way to keep CV predictions using LightGBM's R package

randomgambit commented 7 years ago

so we hit a wall here?😱

Manu4l commented 7 years ago

Seems so :-(

topepo commented 7 years ago

@randomgambit and @Manu4l, you are welcome to contribute.

One one of the main issues is that the package is fairly rough at this point. For example, people are going to want to pass parameters from train to lgb.train. The issue is that some of these are formal arguments to lgb.train and others are lumped into an ambiguous params list that seems to have no documentation and limited checking. For example:

> lightgbm:::lgb.check.params
function (params) 
{
    params
}
<environment: namespace:lightgbm>

The projects options are described here but there is no guarantee that these work in the R implementation, where to call them, or if there is any error trapping.

I can crank something out but would prefer that it work beyond one or two simple cases.

CaviarOnToast commented 7 years ago

If you had a beta I'd be willing to help test out the peculiarities. Have got no coding skills at all but use lightgbm regularly.

Oh, if it's any help there's a simple R front end at .... https://github.com/bwilbertz/RLightGBM (doesn't have things like varImp working, but the basics of training and prediction work fine)

topepo commented 7 years ago

Added a card to the new models project page instead of using issues.

JustinNeumann commented 6 years ago

Hello @topepo, I'm qualified and would love to work on the LightGBM integration for caret. How can I discuss this with you?

topepo commented 6 years ago

Drop me a line at my email; you don't have one associated with your github account and I'm not on twitter =]

JustinNeumann commented 6 years ago

You got it on Gmail, couldn't find an email here but found one with some research.

eddelbuettel commented 6 years ago

Any progress on this? Would love to test a little ...

agilebean commented 5 years ago

Is there any hope that lightgbm will be available in caret? Would be really great for benchmarking purposes!

crj32 commented 4 years ago

Interested too!