Closed AnchorBlues closed 6 years ago
I found that line 3958 ( where json.decoder.JSONDecodeError
occurred ) is as follows:
'"split_gain":nan,'
I think json
can't manage 'nan' and raise json.decoder.JSONDecodeError
.
And the 99th tree structure of the above GBM model is as follows:
{'tree_index': 99,
'num_leaves': 2,
'num_cat': 0,
'shrinkage': 0.1,
'tree_structure': {'split_index': 0,
'split_feature': 0,
'split_gain': 'nan',
'threshold': 1.0000000180025095e-35,
'decision_type': '<=',
'default_left': True,
'missing_type': 'None',
'internal_value': 0,
'internal_count': 455,
'left_child': {'leaf_index': 0,
'leaf_value': 16899999938695358,
'leaf_count': 454},
'right_child': {'leaf_index': 1, 'leaf_value': 0, 'leaf_count': 1}}}
The leaf_value
of the left_child
is too large.
So I have two questions.
split_gain
has nan
? leaf_value
has such a large value ? @AnchorBlues It is caused by your force-split. as these splits create very imbalance splits.
@guolinke
I understand that is caused by imbalanced forced_split.
Then, why the error does not occur when the line dummy[0] += 0.00001
is removed from the above code snippet ?
I think this change doesn't change the degree of imbalance of splits.
@AnchorBlues there is indeed a bug in this case. Thanks very much! fixed in #1809 . @jerryjliu any comments ?
@guolinke Thank you for fixing. After merging #1809 and building the sources, the error does not occur.
However, an another problem occurred. In the bellow case, memory error occurred.
import numpy as np
from sklearn.datasets import load_breast_cancer
import lightgbm as lgb
# load data (binary classification)
data = load_breast_cancer()
x_train = data.data
y_train = data.target
# create a new feature that takes a value 0 or 1.
np.random.seed(0)
dummy = np.random.randint(0, 2, size=len(x_train))
x_train = np.c_[dummy, x_train]
# create json file that forces the tree to split by the new feature. threshold is 0.5.
s = """
{
"feature": 0,
"threshold": 0.5,
"left": {
},
"right": {
}
}
"""
with open("forced_splits-0.json", mode='w') as f:
f.write(s)
# model training
model = lgb.LGBMClassifier(random_state=42, forced_splits="forced_splits-0.json", num_leaves=3, n_estimators=10)
model.fit(x_train, y_train)
[LightGBM] [Fatal] Check failed: tree->num_leaves() <= data_partition_->num_leaves() at /home/anchorbues/packages/LightGBM/src/treelearner/serial_tree_learner.h, line 60 .
This code didn't raise memory error at the 2.1.2 version of LigthGBM ( show https://github.com/Microsoft/LightGBM/issues/1783 ).
@AnchorBlues I think you directly use the code in that branch? That branch is based on an out-of-date master branch. I just update it and everything should be fine now.
@guolinke Thanks! After building the newest sources including commit #1809, every code including forced-split work well.
Hello. The
JSONDecodeError
occurred when I trying to visualize tree structure using thelgb.plot_tree
function.Environment info
Operating System: Ubuntu 14.04.5 LTS, Trusty Tahr
CPU/GPU model: cpu Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz
C++/Python/R version:
Python 3.7.0 (default, Jun 28 2018, 13:15:42) [GCC 7.2.0] :: Anaconda, Inc. on linux
lightgbm version : newest version (merged the commit until the commit id
087d30623aef4b505(Update README.md (#1790))
)source
error message
if the line
is removed, the above code works and the error does not occur.
What is the cause of the error?