microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.61k stars 3.83k forks source link

alias warnings can print several times #5332

Open jmoralez opened 2 years ago

jmoralez commented 2 years ago

Description

When there is a parameter alias conflict and verbosity>=0 the warning is printed twice, when verbosity=-1 the warning is printed once, but if you train more times or re-construct the dataset the number of times the warnings appear can vary.

Reproducible example

import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 4)
y = np.random.rand(100)
ds = lgb.Dataset(X, y)
params = {
    'num_leaves': 3,
    'verbosity': -1,
    'subsample': 0.8,
    'bagging_fraction': 0.5,
    'force_col_wise': True,
}
lgb.train(params, ds, num_boost_round=1)

This prints:

[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5

If we set verbosity=0 then we get two warnings:

[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5

If we train twice, the second time we get three warnings:

import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 4)
y = np.random.rand(100)
ds = lgb.Dataset(X, y)
params = {
    'num_leaves': 3,
    'verbosity': 0,
    'subsample': 0.8,
    'bagging_fraction': 0.5,
    'force_col_wise': True,
}
print('First train')
_ = lgb.train(params, ds, num_boost_round=1)
print('Second train')
_ = lgb.train(params, ds, num_boost_round=1)
First train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
Second train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5

If we set verbosity=-1 the number of warnings decreases:

import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 4)
y = np.random.rand(100)
ds = lgb.Dataset(X, y)
params = {
    'num_leaves': 3,
    'verbosity': 0,
    'subsample': 0.8,
    'bagging_fraction': 0.5,
    'force_col_wise': True,
}
print('1st train')
_ = lgb.train(params, ds, num_boost_round=1)
print('2nd train')
_ = lgb.train(params, ds, num_boost_round=1)
params['verbosity'] = -1
print('3rd train')
_ = lgb.train(params, ds, num_boost_round=1)
print('4th train')
_ = lgb.train(params, ds, num_boost_round=1)
print('5th train')
_ = lgb.train(params, ds, num_boost_round=1)
1st train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
2nd train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
3rd train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
4th train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
5th train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5

And if we re-create the dataset we can actually make the warning disappear:

import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 4)
y = np.random.rand(100)
ds = lgb.Dataset(X, y)
params = {
    'num_leaves': 3,
    'verbosity': 0,
    'subsample': 0.8,
    'bagging_fraction': 0.5,
    'force_col_wise': True,
}
print('1st train')
_ = lgb.train(params, ds, num_boost_round=1)
print('2nd train')
_ = lgb.train(params, ds, num_boost_round=1)
params['verbosity'] = -1
print('3rd train')
_ = lgb.train(params, ds, num_boost_round=1)
print('4th train')
_ = lgb.train(params, ds, num_boost_round=1)
print('5th train')
_ = lgb.train(params, ds, num_boost_round=1)
print('Train afer re-constructing ds')
ds = lgb.Dataset(X, y)
_ = lgb.train(params, ds, num_boost_round=1)
1st train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
2nd train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
3rd train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
4th train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
5th train
[LightGBM] [Warning] bagging_fraction is set=0.5, subsample=0.8 will be ignored. Current value: bagging_fraction=0.5
Train afer re-constructing ds

Environment info

LightGBM version or commit hash: df14e6077e07259163c9200d2c570a022e2625cd

Operating System: Ubuntu 20.04

CPU/GPU model: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

C++ compiler version: g++ 10.3.0

CMake version: 3.23.2

Python version: 3.8.13

Command(s) you used to install LightGBM

git clone --recursive https://github.com/microsoft/LightGBM.git
cd LightGBM
mkdir build && cd build
cmake .. && make -j4
cd .. && pip install -e .
shiyu1994 commented 2 years ago

@jmoralez Thanks for investigating this. I believe it is necessary to fixed. I can help to investigate the source of the redundancy.