Closed sktin closed 1 week ago
Thanks for using LightGBM.
I understand the claims you're making, but not how the code you've provided is related to those claims.
Here's a simpler reproducible example in Python:
import lightgbm as lgb
import numpy as np
rng = np.random.default_rng(0)
X = rng.random((10_000, 10))
y = np.ones(X.shape[0])
bst = lgb.train(
params={"objective": "regression"},
train_set=lgb.Dataset(X, label=y, init_score=np.full_like(y, fill_value=10.0)),
num_boost_round=10
)
bst.predict(X)
# array([0., 0., 0., ..., 0., 0., 0.])
This is expected behavior. When you do not provide an init_score
, LightGBM will take some representative "average" of the target and start boosting from there. For the built-in regression
objective, that is literally the arithmetic mean of the target.
You might find this recent discussion on that interesting: https://github.com/microsoft/LightGBM/pull/6569#discussion_r1791122758
When you DO provide an init_score
, LightGBM skips that step and instead starts boosting from that init_score
. However, it intentionally will never predict that init_score
... it will just use it to evaluate the gain of potential splits, and then use the leaf values from those splits to set the predicted values.
In the example you've described, it's impossible for LightGBM to make any splits... if every value of the target is identical, then the gain of every potential split is 0.0
. As a result, in this situation you'll get a lot of these warnings in logs:
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
And then predict()
will always just return 0.0
.
So it's not that the results "depend highly on the base score" ... it's that LightGBM's behavior differs based on whether or not you provide an init_score
at all.
All the other variation seen in the logs you've provided is unrelated noise, caused by the code snippet you've provided not controlling for randomness in several places, and comparing 3 examples with a constant target to 1 example with a target that randomly varies independently of the features.
The result should be independent of a constant init_score if the model can fit a constant function perfectly, so that any "bad" choice of init_score would eventually be corrected.
This is just not correct. If all values of the target are identical, then the model cannot possibly learn any relationship between the features and the target.
Thank you for explaining the behavior of lightgbm with respect to custom init_score
when fitting a constant function. I was under the impression that with enough boosting steps, GBDT models could recover gracefully from bad initial choice.
I double checked the behavior of xgboost. It seems that it can recover from bad choice of base_score
(initial global bias) but not with bad choice of base_margin
(which would be the equivalent of init_score
in lightgbm). Conceptually they both boost from a certain constant level, but apparently do different things under the hood.
import xgboost as xgb
import numpy as np
rng = np.random.default_rng(0)
X = rng.random((10_000, 10))
y = np.ones(X.shape[0])
params = {"objective": "reg:squarederror", 'reg_lambda': 0}
bst = xgb.train(
params, xgb.DMatrix(
X, y, base_margin=np.full_like(y, fill_value=10.0)
),
num_boost_round=100
)
print('# base_margin=10')
print(bst.predict(xgb.DMatrix(X)))
bst = xgb.train(
params|{'base_score': 10.0}, xgb.DMatrix(
X, y
),
num_boost_round=100
)
print('# base_score=10')
print(bst.predict(xgb.DMatrix(X)))
Output:
# base_margin=10
[-8. -8. -8. ... -8. -8. -8.]
# base_score=10
[1.0000001 1.0000001 1.0000001 ... 1.0000001 1.0000001 1.0000001]
An output of -8 is no better than 0 (which are both wrong), I see no reason to pick on lightgbm in particular. Unless you have further comment, I would consider it case closed.
I was under the impression that with enough boosting steps, GBDT models could recover gracefully from bad initial choice.
They can if there is any meaningful relationship between the features and the target.
Thanks for the XGBoost example, yes that's a good illustration of how this is not specific to LightGBM. We can keep this closed.
Description
This investigation started when it was found that the result of regression appeared to depend on the constant value set in
init_score
(called "base score" in xgboost).The result should be independent of a constant
init_score
if the model can fit a constant function perfectly, so that any "bad" choice ofinit_score
would eventually be corrected. Apparently, this if is not true for lightgbm.Reproducible example
Output:
Environment info
LightGBM version or commit hash: 4.5.0
Command(s) you used to install LightGBM
Additional Comments