Closed gugatr0n1c closed 7 years ago
1,2) it's same as c++, is nthread in c++ working for you? 3) eval_metric is used in fit()
full API:
__init__(num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10,
max_bin=255, silent=True, objective="regression", nthread=-1, min_split_gain=0,
min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1,
colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1,
is_unbalance=False, seed=0)
fit(X, y, eval_set=None, eval_metric=None, early_stopping_rounds=None,
verbose=True, train_fields=None, valid_fields=None, feature_name=None,
categorical_feature=None, other_params=None)
predict(data, raw_score=False, num_iteration=0)
1,2] actually I used pyLightGBM, and with that it was working
3] ok thx, so it is different usage than in xgboost, where eval_metric is NOT used as input for traning, but just for monitoring on valid_data.. here it is input for training how to split data in leafs, right?
@gugatr0n1c
3]. I think xgboost is also use eval_metric
in fit
. https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit
fix 1. for 2, you should set nthread=4, the default value of openmp is 8.
no definitely not, in xgboost it is only used for valid data, https://github.com/dmlc/xgboost/blob/master/doc/parameter.md or here https://github.com/dmlc/xgboost/blob/ef4dcce7372dbc03b5066a614727f2a6dfcbd3bc/src/objective/regression_obj.cc
Xgboost has for regression only RMSE for training and this is not possible to change - there is new plugin system there, where user can change objective, but not with eval_metric
anyway, when I tried to modify eval_metric here in fit() it is just changing log ouput for valid_data, but spliting is always with l2
is it possible to add "training_metric" to change l2 to l1?
hmm, it does not make any sence to use l2 for training and l1 for valid_data monitoring... maybe eval_metric should influace objective fully... (drawbacks is that then only one eval_metric can be used - for some classification problem is good to monitor auc, recall and logloss)
up to you guys... but thanks for your great work, this library is outperforming xgb badly :)
@gugatr0n1c I am not quite understand what you mean.
you want to output training metric during training?
You can add training data to eval_set
.
fit(matrix_train, target_train, [(matrix_train, target_train), (matrix_test, target_test)], eval_metric="l1")
And LightGBM will not load train_data twice.
strange, I changed to code as suggest here
to: params = { 'task' : 'train', 'boosting_type' : 'gbdt', 'objective' : 'regression', 'metric' : 'l1', 'max_depth' : 15, 'num_leaves' : 1000, 'min_data_in_leaf' : 1000, 'learning_rate' : 0.0025, 'feature_fraction' : 0.2, 'bagging_fraction' : 0.83, 'bagging_freq': 1, 'verbose' : 0, 'nthread' : 4 }
model = lg.train(
params,
train_data = (matrix_train, target_train),
num_boost_round = 2000,
valid_datas = (matrix_test, target_test),
early_stopping_rounds = 50
)
and now nthread is working correctly - so the previous is probably only 'sklearn' issue. But I believe metric = 'l1' is not working here as well (but it was working with pyLightGBM).
Goal is to enabled learning regression task with 'l1' metric.
@gugatr0n1c , I just have a try, and it output L1 metrics. Can you paste your code?
@guolinke there two different things:
1] one is to output error on eval_data with chosen metric - this is working correctly for me as well but 2] second is to build tree - when in each split you choosing split according to chosen metric, this I believe is only taking alwas 'l2' - no matter I set metric = 'l1', but when I was using pyLightGBM it was working
are you sure pyLightGbm can do this? you need to change objective funtion to support this.
and tree split is not according to any metrics. It uses the gradients calculated by objective function.
And you can give a example how to set pyLightGBM to use L1 as split tree. I will check for it.
ok, again, sorry for confusion...
anyway, I just modify this issue as proposal of creating new objective (instead of accusation that somethong is not working): MAEregression, where objective is based on absolute difference, not on least squares... similary as in deeplearning library MxNET: https://turi.com/products/create/docs/generated/graphlab.mxnet.symbol.MAERegressionOutput.html#graphlab.mxnet.symbol.MAERegressionOutput
this is robust solution for regression when target has many outliers thx
@gugatr0n1c lightgbm is same as xgboost in this part. You can see this thread. It includes how you use MAE as objective.
@gugatr0n1c
I think you misunderstand for parameter is_training_metric
. its fullname is is_provide_training_metric
, which means print metric of training data or not.
ok thx, for explanation, closing
I have the same issue with 1)
nthread is not working in both python vanila and scikit API.
it works in bin + train.conf, however.
my settings:
param = {
'task': 'train',
'boost_type': 'gbdt',
'objective': 'multiclass',
'num_class': 3,
'max_bin': 255,
'learning_rate': 1,
'num_leaves': 31,
'verbose': 1,
'nthread': 1,
}
@supdizh I just have a try. nthread works. can you paste the full code ? especially for the use of parameters.
@guolinke never mind. stranglely the exact code works now
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
Hi,
1] it seems to me that nthread is not working in Python interface - no matter what I set all threads are used.
2] If I have 4cpu with multithreading = 8 threads, this still call all 8 - default setting is -1, so it should call 4 or not?
3] is there way to set metric from default l2 to l1 in python? set metric = 'l1' is not working
thx
calling this:
model = lg.LGBMRegressor( objective = 'regression',
metric = 'l2', - commented