Closed defaultRobot closed 7 years ago
This is very strange. Can you provide data? thanks
Hi @anddelu , that could because of memory was completely used up? could you please pasting the log here.
hi, thanks for responsing: environment:memory 8g I have tried to reduce data less than 5M, and found it works. I thought the lightgbm doesn't work beacause it can handle too many datas. the following datas caused the lightgbm stop Attached is datas: the number of dimensions is more than 90 multiclass.txt
the pricture:
@chivee I think data of 63000*90 is very small, it cannot be out of memory.
@anddelu I try to run:
lightgbm.exe data=multiclass.txt valid=multiclass.txt objective=multiclass num_class=5
and it finish successfully.
Can you also provide your parameters?
here are my parameters, based on examples, in the train.conf file: data=multiclass.train valid_data=multiclass.test objective multiclass num_class=5 metric=multi_logloss metric_freq=1 early_stopping=10 num_trees=100 learning_rate=0.05 num_leaves=31
I used your method and find it work, howerever I use more datas to train model, it still shows error:
Attached files: training datas & train.conf multiclass.txt train_conf.txt
I still can success run with your new data and config. BTW the data in your config is not existing, So i change it to multiclass.txt both for training data and validation data.
thanks for your response.
BTW: the name of config file is correct, I just changed its name when upload it because of supported format.
Unfortuantely it doesn't work for me. I thougth maybe the lightgbm.exe is the reason.
I have tried to download the new lightgbm and release it(VS 2013 release 64). It still doesn't work.
I am very confused so I want to upload it to see if you can meet it again.
I reduce number of dimensions less than 70 and found it could work again. D:\multiclass_classification>lightgbm.exe config=train_conf.txt [LightGBM] [Info] Finished loading parameters [LightGBM] [Warning] Ignoring feature Column_38, only has one value [LightGBM] [Info] Finished loading data in 0.260338 seconds [LightGBM] [Info] Number of data: 62941, number of features: 69 ...... [LightGBM] [Info] Early stopping at iteration 35, the best iteration round is 25 [LightGBM] [Info] 2.871995 seconds elapsed, finished iteration 35 [LightGBM] [Info] Finished training
@anddelu i still can run with your exe... BTW, Did you use training data as validation data? if not, can you provide your validation data as well?
you can try my exe as well. lightgbm.exe.txt
validation data is not the traning data. Environment: number of dimension more than 90. number of multiclass = 5 I used your lightghm, it still doesn't work by using the First datas and config file multiclass_train.txt multiclass_test.txt train_conf.txt
However I use the Second data(more than first) and config file, the lightgbm of both you and I can work well. I am very really confused. The data is the cause, because the First Data is part of Second Data ? multiclass_train1.txt multiclass_test1.txt train_conf1.txt
It actually have one bug. Have fixed: https://github.com/Microsoft/LightGBM/commit/92351659a7f39bed7fe20b67ce4b27f26501cd77
thanks very much! I have rereleased it when download fixed codes. Now the ligthgbm can work well. Thank you!!!
I have this issue. lightgbm freezes - the number of features in my dataset is more than 4000. Please help..
@msafi04 are you on the latest LightGBM ? can you also provide more information, like the data, hardward env and so on ?
Thanks for the response. My data is of shape (4459, 4735), I am using MacBook Pro. Is it because I am not using GPUs. do u want the code? below is the code to train/predict my dataset.
def pred_lgbm(df, target): target = target.astype('int') params = { "objective" : "regression", "metric" : "rmse", "num_leaves" : 30, "learning_rate" : 0.01, "bagging_fraction" : 0.7, "feature_fraction" : 0.7, "bagging_frequency" : 5, "bagging_seed" : 2018, "verbosity" : 3 } Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2) ltrain = lgbm.Dataset(Xtrain, ytrain) lvalid = lgbm.Dataset(Xvalid, yvalid) watchlist = [ltrain, lvalid] print('Lgbm training..') clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100, verbose_eval = 10) pred = clf.predict(Xvalid, num_iteration = clf.best_iteration)
print('RMLSE: ', rmsle(yvalid, pred))
return None
@guolinke I reduced the dimension to 2000 plus but still facing the issue..pls help
It seems this is not likely to crash. Did you build the package from latest code ?
I am not sure which is latest code..could you help me give the latest code. thanks
thanks. can I install in anaconda?
I updated to 2.1.1 but still it crashes..
I just update it to 2.1.2, can you try on it ? It will be better if you can provide a re-produce code with random generated data.
@guolinke Pls check my code below. thanks
import pandas as pd import numpy as np import lightgbm as lgbm from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.feature_selection import VarianceThreshold
def pred_lgbm(df, target): target = target.astype('int') params = { "boosting_type" : "gbdt", "objective" : "regression", "metric" : "rmse", "num_leaves" : 30, "learning_rate" : 0.01, "bagging_fraction" : 0.7, "feature_fraction" : 0.7, "bagging_frequency" : 5, "verbosity" : 3 } Xtrain, Xvalid, ytrain, yvalid = train_test_split(df, target, test_size = 0.2) ltrain = lgbm.Dataset(Xtrain, ytrain) lvalid = lgbm.Dataset(Xvalid, yvalid) watchlist = [ltrain, lvalid] print('Lgbm training..') clf = lgbm.train(params, ltrain, num_boost_round = 1000, valid_sets = watchlist, early_stopping_rounds = 100, verbose_eval = 10) pred = clf.predict(Xvalid, num_iteration = clf.best_iteration) lgbm.plot_importance(clf) return None
def main(): input_file = r'train.csv.zip' df = pd.read_csv(input_file) df.drop('ID', axis = 1, inplace = True) print(df.shape) target = df['target'].copy() df.drop('target', axis = 1, inplace = True) scl = StandardScaler() df = scl.fit_transform(df) print(df.shape, target.shape) print('Scaling done..') varThres = VarianceThreshold(threshold = 0.5) df = varThres.fit_transform(df) print('Variance Thershold done..') print(df.shape, target.shape) pred_lgbm(df, target)
if name == 'main': main()
@msafi04 Seems the code and data are OK - just run your snippet:
(4459, 4992)
(4459, 4991) (4459,)
Scaling done..
Variance Thershold done..
(4459, 4735) (4459,)
Lgbm training..
Training until validation scores don't improve for 100 rounds.
[10] training's rmse: 8.05324e+06 valid_1's rmse: 7.85907e+06
[20] training's rmse: 7.83593e+06 valid_1's rmse: 7.75118e+06
[30] training's rmse: 7.6388e+06 valid_1's rmse: 7.65281e+06
[40] training's rmse: 7.46258e+06 valid_1's rmse: 7.57028e+06
[50] training's rmse: 7.29981e+06 valid_1's rmse: 7.49248e+06
[60] training's rmse: 7.15265e+06 valid_1's rmse: 7.4294e+06
[70] training's rmse: 7.01553e+06 valid_1's rmse: 7.37953e+06
[80] training's rmse: 6.88825e+06 valid_1's rmse: 7.33082e+06
[90] training's rmse: 6.77233e+06 valid_1's rmse: 7.28675e+06
[100] training's rmse: 6.66424e+06 valid_1's rmse: 7.25186e+06
[110] training's rmse: 6.56176e+06 valid_1's rmse: 7.21713e+06
[120] training's rmse: 6.46828e+06 valid_1's rmse: 7.18686e+06
[130] training's rmse: 6.37945e+06 valid_1's rmse: 7.1649e+06
[140] training's rmse: 6.29867e+06 valid_1's rmse: 7.14595e+06
[150] training's rmse: 6.22141e+06 valid_1's rmse: 7.12732e+06
[160] training's rmse: 6.14847e+06 valid_1's rmse: 7.11351e+06
[170] training's rmse: 6.08012e+06 valid_1's rmse: 7.10631e+06
[180] training's rmse: 6.01555e+06 valid_1's rmse: 7.09486e+06
[190] training's rmse: 5.95376e+06 valid_1's rmse: 7.08501e+06
[200] training's rmse: 5.89536e+06 valid_1's rmse: 7.08337e+06
[210] training's rmse: 5.83995e+06 valid_1's rmse: 7.07864e+06
[220] training's rmse: 5.7867e+06 valid_1's rmse: 7.07283e+06
[230] training's rmse: 5.73427e+06 valid_1's rmse: 7.06722e+06
[240] training's rmse: 5.68461e+06 valid_1's rmse: 7.06331e+06
[250] training's rmse: 5.63637e+06 valid_1's rmse: 7.05935e+06
[260] training's rmse: 5.59054e+06 valid_1's rmse: 7.05584e+06
[270] training's rmse: 5.54617e+06 valid_1's rmse: 7.04874e+06
[280] training's rmse: 5.50349e+06 valid_1's rmse: 7.04536e+06
[290] training's rmse: 5.46137e+06 valid_1's rmse: 7.0422e+06
[300] training's rmse: 5.41947e+06 valid_1's rmse: 7.03769e+06
[310] training's rmse: 5.3805e+06 valid_1's rmse: 7.03732e+06
[320] training's rmse: 5.34281e+06 valid_1's rmse: 7.03467e+06
[330] training's rmse: 5.30545e+06 valid_1's rmse: 7.0324e+06
[340] training's rmse: 5.268e+06 valid_1's rmse: 7.0315e+06
[350] training's rmse: 5.23303e+06 valid_1's rmse: 7.03043e+06
[360] training's rmse: 5.19829e+06 valid_1's rmse: 7.03139e+06
[370] training's rmse: 5.1656e+06 valid_1's rmse: 7.03016e+06
[380] training's rmse: 5.13263e+06 valid_1's rmse: 7.02977e+06
[390] training's rmse: 5.10139e+06 valid_1's rmse: 7.02994e+06
[400] training's rmse: 5.0704e+06 valid_1's rmse: 7.02894e+06
[410] training's rmse: 5.0401e+06 valid_1's rmse: 7.02555e+06
[420] training's rmse: 5.01039e+06 valid_1's rmse: 7.0228e+06
[430] training's rmse: 4.98113e+06 valid_1's rmse: 7.02337e+06
[440] training's rmse: 4.95388e+06 valid_1's rmse: 7.02124e+06
[450] training's rmse: 4.92627e+06 valid_1's rmse: 7.02215e+06
[460] training's rmse: 4.89821e+06 valid_1's rmse: 7.0211e+06
[470] training's rmse: 4.87228e+06 valid_1's rmse: 7.02058e+06
[480] training's rmse: 4.8454e+06 valid_1's rmse: 7.0215e+06
[490] training's rmse: 4.82091e+06 valid_1's rmse: 7.02276e+06
[500] training's rmse: 4.79609e+06 valid_1's rmse: 7.02203e+06
[510] training's rmse: 4.77164e+06 valid_1's rmse: 7.02348e+06
[520] training's rmse: 4.74809e+06 valid_1's rmse: 7.02513e+06
[530] training's rmse: 4.72451e+06 valid_1's rmse: 7.02571e+06
[540] training's rmse: 4.7003e+06 valid_1's rmse: 7.02951e+06
[550] training's rmse: 4.67647e+06 valid_1's rmse: 7.03081e+06
[560] training's rmse: 4.65358e+06 valid_1's rmse: 7.03147e+06
Early stopping, best iteration is:
[469] training's rmse: 4.87491e+06 valid_1's rmse: 7.02009e+06
Not related to the issue, but there is no bagging_frequency
parameter in LightGBM, only bagging_freq
.
@StrikerRUS Thanks for the response. I ran the code but facing same issue. It appears my kernel is dead and restarts. pls check the screen shot
can you try it wothout jupyter?
@guolinke I got the below error. MacBook-Pro:~ msafi04$ python first.py (4459, 4992) (4459, 4991) (4459,) Scaling done.. Variance Thershold done.. (4459, 4735) (4459,) Lgbm training.. OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized. OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. Abort trap: 6
It seems your env has some issues with openmp. you can re-install gcc8 and try again.
I installed gcc@8 but same error as above..OMP: Error
@msafi04 refer to https://github.com/dmlc/xgboost/issues/1715
@msafi04 can you try:
brew uninstall libiomp clang-omp gcc
brew install gcc@8
if you has other gcc packages, please uninstall them as well.
@guolinke I get this error now after your suggestion.
ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libgfortran.3.dylib Referenced from: /anaconda3/lib/python3.6/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-36m-darwin.so Reason: image not found
Traceback (most recent call last):
File "first.py", line 4, in
it seems you need to reinstall scipy numpy sklearn
reinstalling didn't help. still facing the same error. pls help me.
you can try to uninstall the anaconda, then re-install.
@guolinke It worked!! Thanks for your patience and help.
Environment: windows, number of datas: 6.3w I have found one problem: the lightgbm to train data doesn't work when number of dimensions is more than 90. However the number of dimmension is less than 70, the lightgbm can train data and predict data. I can provide datas if needed.