microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.67k stars 3.83k forks source link

Early Stopping does not work in LGBMClassifier #6244

Open ohbtorres opened 10 months ago

ohbtorres commented 10 months ago

Description

Using lightgbm.early_stopping does not work while using Scikit-learn API, and the training still remains after the early stop condition occurs

Reproducible example

import pandas as pd
import numpy as np
import lightgbm as lgbm

# Random train set
x12 = pd.DataFrame({"x1": np.random.randint(-10, 10, 10000),
                                   "x2": np.random.randint(-10, 10, 10000)})
y12 = pd.Series(np.random.randint(0, 2, 10000))
# Random validation set
x12_valid = pd.DataFrame({"x1": np.random.randint(-10, 10, 1000),
                          "x2": np.random.randint(-10, 10, 1000)})
y12_valid = pd.Series(np.random.randint(0, 2, 1000))
# Training the model using a validation set
lgbm.LGBMClassifier(boosting_type='dart',
                    learning_rate=0.001,
                    n_estimators=5000,
                    objective='binary')\
    .fit(x12, y12,
         eval_set=(x12_valid, y12_valid),
         eval_metric="logloss",
         callbacks=[lgbm.log_evaluation(period=100),
                          lgbm.early_stopping(stopping_rounds=100)])

The result will be this image

The binary_logloss starts with 0.693169 and end with 0.697759 and no stopping happened

Environment info

LightGBM version or commit hash: 4.1.0

Command(s) you used to install LightGBM

pip install lightgbm

pandas==2.0.3 numpy==1.24.4

jmoralez commented 10 months ago

Hey @ohbtorres, thanks for using LightGBM. Early stopping isn't supported in dart, you should be getting this warning: https://github.com/microsoft/LightGBM/blob/0a9a6bbf6d96cb01c3fdc7ace6b13da828857c82/python-package/lightgbm/callback.py#L325 Is it not showing?

ohbtorres commented 10 months ago

Hi @jmoralez, thank you for your answer. You are right, changing the boosting_type the callback works. Thanks a lot! Is there any information about this here? I am not getting this warning message image

jmoralez commented 10 months ago

Is there any information about this here?

I think we could add it here.

I am not getting this warning message

Did you disable Python's warnings? Running your example I get the following:

[LightGBM] [Info] Number of positive: 5074, number of negative: 4926
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000097 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 40
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 2
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.507400 -> initscore=0.029602
[LightGBM] [Info] Start training from score 0.029602
/hdd/github/LightGBM/python-package/lightgbm/callback.py:325: UserWarning: Early stopping is not available in dart mode
  _log_warning('Early stopping is not available in dart mode')
ohbtorres commented 10 months ago

I didn't. I don't know what coulb be happening. However, I got it, and early stopping is working fine. Thanks @jmoralez!

jmoralez commented 10 months ago

Would you like to add that dart isn't supported in the early stopping callback docstring?