microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.68k stars 3.83k forks source link

LightGBM run error when number of dimensions is more than 90 #93

Closed defaultRobot closed 7 years ago

defaultRobot commented 7 years ago

Environment: windows, number of datas: 6.3w I have found one problem: the lightgbm to train data doesn't work when number of dimensions is more than 90. However the number of dimmension is less than 70, the lightgbm can train data and predict data. I can provide datas if needed.

guolinke commented 7 years ago

This is very strange. Can you provide data? thanks

chivee commented 7 years ago

Hi @anddelu , that could because of memory was completely used up? could you please pasting the log here.

defaultRobot commented 7 years ago

hi, thanks for responsing: environment:memory 8g I have tried to reduce data less than 5M, and found it works. I thought the lightgbm doesn't work beacause it can handle too many datas. the following datas caused the lightgbm stop Attached is datas: the number of dimensions is more than 90 multiclass.txt

the pricture: image

guolinke commented 7 years ago

@chivee I think data of 63000*90 is very small, it cannot be out of memory.

guolinke commented 7 years ago

@anddelu I try to run: lightgbm.exe data=multiclass.txt valid=multiclass.txt objective=multiclass num_class=5 and it finish successfully. Can you also provide your parameters?

defaultRobot commented 7 years ago

here are my parameters, based on examples, in the train.conf file: data=multiclass.train valid_data=multiclass.test objective multiclass num_class=5 metric=multi_logloss metric_freq=1 early_stopping=10 num_trees=100 learning_rate=0.05 num_leaves=31

I used your method and find it work, howerever I use more datas to train model, it still shows error: err01

Attached files: training datas & train.conf multiclass.txt train_conf.txt

guolinke commented 7 years ago

image

I still can success run with your new data and config. BTW the data in your config is not existing, So i change it to multiclass.txt both for training data and validation data.

defaultRobot commented 7 years ago

thanks for your response. BTW: the name of config file is correct, I just changed its name when upload it because of supported format. Unfortuantely it doesn't work for me. I thougth maybe the lightgbm.exe is the reason.
I have tried to download the new lightgbm and release it(VS 2013 release 64). It still doesn't work. I am very confused so I want to upload it to see if you can meet it again.

I reduce number of dimensions less than 70 and found it could work again. D:\multiclass_classification>lightgbm.exe config=train_conf.txt [LightGBM] [Info] Finished loading parameters [LightGBM] [Warning] Ignoring feature Column_38, only has one value [LightGBM] [Info] Finished loading data in 0.260338 seconds [LightGBM] [Info] Number of data: 62941, number of features: 69 ...... [LightGBM] [Info] Early stopping at iteration 35, the best iteration round is 25 [LightGBM] [Info] 2.871995 seconds elapsed, finished iteration 35 [LightGBM] [Info] Finished training

lightgbm.exe.txt

guolinke commented 7 years ago

@anddelu i still can run with your exe... BTW, Did you use training data as validation data? if not, can you provide your validation data as well?

guolinke commented 7 years ago

you can try my exe as well. lightgbm.exe.txt

defaultRobot commented 7 years ago

validation data is not the traning data. Environment: number of dimension more than 90. number of multiclass = 5 I used your lightghm, it still doesn't work by using the First datas and config file multiclass_train.txt multiclass_test.txt train_conf.txt

However I use the Second data(more than first) and config file, the lightgbm of both you and I can work well. I am very really confused. The data is the cause, because the First Data is part of Second Data ? multiclass_train1.txt multiclass_test1.txt train_conf1.txt

guolinke commented 7 years ago

It actually have one bug. Have fixed: https://github.com/Microsoft/LightGBM/commit/92351659a7f39bed7fe20b67ce4b27f26501cd77

defaultRobot commented 7 years ago

thanks very much! I have rereleased it when download fixed codes. Now the ligthgbm can work well. Thank you!!!

msafi04 commented 6 years ago

I have this issue. lightgbm freezes - the number of features in my dataset is more than 4000. Please help..

guolinke commented 6 years ago

@msafi04 are you on the latest LightGBM ? can you also provide more information, like the data, hardward env and so on ?

msafi04 commented 6 years ago

Thanks for the response. My data is of shape (4459, 4735), I am using MacBook Pro. Is it because I am not using GPUs. do u want the code? below is the code to train/predict my dataset.

def pred_lgbm(df, target): target = target.astype('int') params = { "objective" : "regression", "metric" : "rmse", "num_leaves" : 30, "learning_rate" : 0.01, "bagging_fraction" : 0.7, "feature_fraction" : 0.7, "bagging_frequency" : 5, "bagging_seed" : 2018, "verbosity" : 3 } Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2) ltrain = lgbm.Dataset(Xtrain, ytrain) lvalid = lgbm.Dataset(Xvalid, yvalid) watchlist = [ltrain, lvalid] print('Lgbm training..') clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100, verbose_eval = 10) pred = clf.predict(Xvalid, num_iteration = clf.best_iteration)

lgbm.plot_importance(clf)

print('RMLSE: ', rmsle(yvalid, pred))
return None
msafi04 commented 6 years ago

@guolinke I reduced the dimension to 2000 plus but still facing the issue..pls help

guolinke commented 6 years ago

It seems this is not likely to crash. Did you build the package from latest code ?

msafi04 commented 6 years ago

I am not sure which is latest code..could you help me give the latest code. thanks

guolinke commented 6 years ago

refer to https://github.com/Microsoft/LightGBM/tree/master/python-package#install-from-github

msafi04 commented 6 years ago

thanks. can I install in anaconda?

msafi04 commented 6 years ago

I updated to 2.1.1 but still it crashes..

guolinke commented 6 years ago

I just update it to 2.1.2, can you try on it ? It will be better if you can provide a re-produce code with random generated data.

msafi04 commented 6 years ago

@guolinke Pls check my code below. thanks

import pandas as pd import numpy as np import lightgbm as lgbm from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.feature_selection import VarianceThreshold

def pred_lgbm(df, target): target = target.astype('int') params = { "boosting_type" : "gbdt", "objective" : "regression", "metric" : "rmse", "num_leaves" : 30, "learning_rate" : 0.01, "bagging_fraction" : 0.7, "feature_fraction" : 0.7, "bagging_frequency" : 5, "verbosity" : 3 } Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2) ltrain = lgbm.Dataset(Xtrain, ytrain) lvalid = lgbm.Dataset(Xvalid, yvalid) watchlist = [ltrain, lvalid] print('Lgbm training..') clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100, verbose_eval = 10) pred = clf.predict(Xvalid, num_iteration = clf.best_iteration) lgbm.plot_importance(clf) return None

def main(): input_file = r'train.csv.zip' df = pd.read_csv(input_file) df.drop('ID', axis = 1, inplace = True) print(df.shape) target = df['target'].copy() df.drop('target', axis = 1, inplace = True) scl = StandardScaler() df = scl.fit_transform(df) print(df.shape, target.shape) print('Scaling done..') varThres = VarianceThreshold(threshold = 0.5) df = varThres.fit_transform(df) print('Variance Thershold done..') print(df.shape, target.shape) pred_lgbm(df, target)

if name == 'main': main()

StrikerRUS commented 6 years ago

@msafi04 Seems the code and data are OK - just run your snippet:

(4459, 4992)
(4459, 4991) (4459,)
Scaling done..
Variance Thershold done..
(4459, 4735) (4459,)
Lgbm training..
Training until validation scores don't improve for 100 rounds.
[10]    training's rmse: 8.05324e+06    valid_1's rmse: 7.85907e+06
[20]    training's rmse: 7.83593e+06    valid_1's rmse: 7.75118e+06
[30]    training's rmse: 7.6388e+06 valid_1's rmse: 7.65281e+06
[40]    training's rmse: 7.46258e+06    valid_1's rmse: 7.57028e+06
[50]    training's rmse: 7.29981e+06    valid_1's rmse: 7.49248e+06
[60]    training's rmse: 7.15265e+06    valid_1's rmse: 7.4294e+06
[70]    training's rmse: 7.01553e+06    valid_1's rmse: 7.37953e+06
[80]    training's rmse: 6.88825e+06    valid_1's rmse: 7.33082e+06
[90]    training's rmse: 6.77233e+06    valid_1's rmse: 7.28675e+06
[100]   training's rmse: 6.66424e+06    valid_1's rmse: 7.25186e+06
[110]   training's rmse: 6.56176e+06    valid_1's rmse: 7.21713e+06
[120]   training's rmse: 6.46828e+06    valid_1's rmse: 7.18686e+06
[130]   training's rmse: 6.37945e+06    valid_1's rmse: 7.1649e+06
[140]   training's rmse: 6.29867e+06    valid_1's rmse: 7.14595e+06
[150]   training's rmse: 6.22141e+06    valid_1's rmse: 7.12732e+06
[160]   training's rmse: 6.14847e+06    valid_1's rmse: 7.11351e+06
[170]   training's rmse: 6.08012e+06    valid_1's rmse: 7.10631e+06
[180]   training's rmse: 6.01555e+06    valid_1's rmse: 7.09486e+06
[190]   training's rmse: 5.95376e+06    valid_1's rmse: 7.08501e+06
[200]   training's rmse: 5.89536e+06    valid_1's rmse: 7.08337e+06
[210]   training's rmse: 5.83995e+06    valid_1's rmse: 7.07864e+06
[220]   training's rmse: 5.7867e+06 valid_1's rmse: 7.07283e+06
[230]   training's rmse: 5.73427e+06    valid_1's rmse: 7.06722e+06
[240]   training's rmse: 5.68461e+06    valid_1's rmse: 7.06331e+06
[250]   training's rmse: 5.63637e+06    valid_1's rmse: 7.05935e+06
[260]   training's rmse: 5.59054e+06    valid_1's rmse: 7.05584e+06
[270]   training's rmse: 5.54617e+06    valid_1's rmse: 7.04874e+06
[280]   training's rmse: 5.50349e+06    valid_1's rmse: 7.04536e+06
[290]   training's rmse: 5.46137e+06    valid_1's rmse: 7.0422e+06
[300]   training's rmse: 5.41947e+06    valid_1's rmse: 7.03769e+06
[310]   training's rmse: 5.3805e+06 valid_1's rmse: 7.03732e+06
[320]   training's rmse: 5.34281e+06    valid_1's rmse: 7.03467e+06
[330]   training's rmse: 5.30545e+06    valid_1's rmse: 7.0324e+06
[340]   training's rmse: 5.268e+06  valid_1's rmse: 7.0315e+06
[350]   training's rmse: 5.23303e+06    valid_1's rmse: 7.03043e+06
[360]   training's rmse: 5.19829e+06    valid_1's rmse: 7.03139e+06
[370]   training's rmse: 5.1656e+06 valid_1's rmse: 7.03016e+06
[380]   training's rmse: 5.13263e+06    valid_1's rmse: 7.02977e+06
[390]   training's rmse: 5.10139e+06    valid_1's rmse: 7.02994e+06
[400]   training's rmse: 5.0704e+06 valid_1's rmse: 7.02894e+06
[410]   training's rmse: 5.0401e+06 valid_1's rmse: 7.02555e+06
[420]   training's rmse: 5.01039e+06    valid_1's rmse: 7.0228e+06
[430]   training's rmse: 4.98113e+06    valid_1's rmse: 7.02337e+06
[440]   training's rmse: 4.95388e+06    valid_1's rmse: 7.02124e+06
[450]   training's rmse: 4.92627e+06    valid_1's rmse: 7.02215e+06
[460]   training's rmse: 4.89821e+06    valid_1's rmse: 7.0211e+06
[470]   training's rmse: 4.87228e+06    valid_1's rmse: 7.02058e+06
[480]   training's rmse: 4.8454e+06 valid_1's rmse: 7.0215e+06
[490]   training's rmse: 4.82091e+06    valid_1's rmse: 7.02276e+06
[500]   training's rmse: 4.79609e+06    valid_1's rmse: 7.02203e+06
[510]   training's rmse: 4.77164e+06    valid_1's rmse: 7.02348e+06
[520]   training's rmse: 4.74809e+06    valid_1's rmse: 7.02513e+06
[530]   training's rmse: 4.72451e+06    valid_1's rmse: 7.02571e+06
[540]   training's rmse: 4.7003e+06 valid_1's rmse: 7.02951e+06
[550]   training's rmse: 4.67647e+06    valid_1's rmse: 7.03081e+06
[560]   training's rmse: 4.65358e+06    valid_1's rmse: 7.03147e+06
Early stopping, best iteration is:
[469]   training's rmse: 4.87491e+06    valid_1's rmse: 7.02009e+06

Not related to the issue, but there is no bagging_frequency parameter in LightGBM, only bagging_freq.

msafi04 commented 6 years ago

@StrikerRUS Thanks for the response. I ran the code but facing same issue. It appears my kernel is dead and restarts. pls check the screen shot

screen shot 2018-06-26 at 11 44 00 am
guolinke commented 6 years ago

can you try it wothout jupyter?

msafi04 commented 6 years ago

@guolinke I got the below error. MacBook-Pro:~ msafi04$ python first.py (4459, 4992) (4459, 4991) (4459,) Scaling done.. Variance Thershold done.. (4459, 4735) (4459,) Lgbm training.. OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized. OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. Abort trap: 6

guolinke commented 6 years ago

It seems your env has some issues with openmp. you can re-install gcc8 and try again.

msafi04 commented 6 years ago

I installed gcc@8 but same error as above..OMP: Error

guolinke commented 6 years ago

@msafi04 refer to https://github.com/dmlc/xgboost/issues/1715

guolinke commented 6 years ago

@msafi04 can you try:

brew uninstall libiomp clang-omp gcc
brew install gcc@8

if you has other gcc packages, please uninstall them as well.

msafi04 commented 6 years ago

@guolinke I get this error now after your suggestion.

ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so Reason: image not found

msafi04 commented 6 years ago

Traceback (most recent call last): File "first.py", line 4, in from sklearn.model_selection import train_test_split File "/anaconda3/lib/python3.6/site-packages/sklearn/init.py", line 134, in from .base import clone File "/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 13, in from .utils.fixes import signature File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/init.py", line 11, in from .validation import (as_float_array, File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 18, in from ..utils.fixes import signature File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/fixes.py", line 144, in from scipy.sparse.linalg import lsqr as sparse_lsqr # noqa File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/init.py", line 117, in from .eigen import File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/init.py", line 11, in from .arpack import File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/init.py", line 22, in from .arpack import * File "/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 45, in from . import _arpack ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so Reason: image not found

guolinke commented 6 years ago

it seems you need to reinstall scipy numpy sklearn

msafi04 commented 6 years ago

reinstalling didn't help. still facing the same error. pls help me.

guolinke commented 6 years ago

you can try to uninstall the anaconda, then re-install.

msafi04 commented 6 years ago

@guolinke It worked!! Thanks for your patience and help.