microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.56k stars 3.82k forks source link

Train with GPU failed on Apple Silicon M1 #6189

Open uyennguyen24 opened 10 months ago

uyennguyen24 commented 10 months ago

Description

Hi, I try to train the dataset with param device='gpu' on my macbook M1 but it failed with this error: [LightGBM] [Info] Number of positive: 150000, number of negative: 150000 [LightGBM] [Info] This is the GPU trainer!! [LightGBM] [Info] Total Bins 510 [LightGBM] [Info] Number of data points in the train set: 300000, number of used features: 2 [LightGBM] [Info] Using GPU Device: Apple M1, Vendor: Apple [LightGBM] [Info] Compiling OpenCL Kernel with 256 bins... [LightGBM] [Info] GPU programs have been built [LightGBM] [Info] Size of histogram bin entry: 8 [LightGBM] [Info] 2 dense feature groups (1.14 MB) transferred to GPU in 0.000923 secs. 0 sparse feature groups [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000 [LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at /private/var/folders/dy/fgk92ngj7cx4q_ggnbyfh5xm0000gn/T/pip-install-6fxn6kjn/lightgbm_2f09b1a9e50f4240ae7ac11d5f3de230/src/treelearner/serial_tree_learner.cpp, line 845 .

I don't have any problem with cpu.

Reproducible example

from lightgbm import LGBMClassifier
from sklearn.datasets import make_moons

model = LGBMClassifier(device='gpu')
train, label = make_moons(n_samples=300000, shuffle=True, noise=0.3, random_state=None)
model.fit(train, label)

Environment info

LightGBM version or commit hash: lightgbm-4.1.0

Command(s) you used to install LightGBM

pip install lightgbm --config-settings=cmake.define.USE_GPU=ON

Machine: Apple Macbook Air M1

Additional Comments

Thank you for your help

jameslamb commented 10 months ago

Thanks for your interest in LightGBM, and for the excellent report!

Could you try installing from the latest development version (following this doc) and let us know if that resolves the issue for you?

git clone --recursive git@github.com:microsoft/LightGBM.git
cd ./LightGBM
pip uninstall --yes lightgbm
sh build-python.sh install  --gpu

There are several not-yet-released bug fixes which could help you. Sorry for the inconvenience, we will try to get a new release out soon.

If that doesn't work, let me know and we can try some other things. LightGBM isn't currently tested on the GPUs in Apple M1/M2/M3 machines, but I do have an M2 laptop I could use to test some things.

uyennguyen24 commented 10 months ago

Hi James,

Thank you for your quick response.

I uninstalled and re-install via repo with build-python.sh:

print(lightgbm.__version__)
4.1.0.99

But I still have the issue:

[LightGBM] [Info] Number of positive: 150000, number of negative: 150000 [LightGBM] [Info] This is the GPU trainer!! [LightGBM] [Info] Total Bins 510 [LightGBM] [Info] Number of data points in the train set: 300000, number of used features: 2 [LightGBM] [Info] Using GPU Device: Apple M1, Vendor: Apple [LightGBM] [Info] Compiling OpenCL Kernel with 256 bins... [LightGBM] [Info] GPU programs have been built [LightGBM] [Info] Size of histogram bin entry: 8 [LightGBM] [Info] 2 dense feature groups (1.14 MB) transferred to GPU in 0.000964 secs. 0 sparse feature groups [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000 [LightGBM] [Fatal] Check failed: (best_split_info.right_count) > (0) at /Users/uyennguyen24/LightGBM/lightgbm-python/src/treelearner/serial_tree_learner.cpp, line 856 .

Do you have the same issue on M2 chip?

jameslamb commented 10 months ago

Thanks for trying that! Very helpful.

I'll try on my M2 some time in the next few days and let you know what I find.

rickypang0219 commented 9 months ago

Hi, may I know how do you install lightgbm ( build from source or conda/pip). I use conda to install lightbgm for my M1 MacBook but I got the following outputs.

LightGBMError: GPU Tree Learner was not enabled in this build.
Please recompile with CMake option -DUSE_GPU=1
rickypang0219 commented 9 months ago

Hi, James, I used my M1 chip to download LightGBM based on your suggestion and I run the example code provided by uyennguyen24, my Jupyter Kernel dies immediately. If I run the same code in .py format, the output says there is a zsh: segmentation fault

pedrovelmo commented 8 months ago

Having similar problems (dead kernel). Any solution so far?