microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.67k stars 3.83k forks source link

Early stopping: overfit prevention #4996

Open mayer79 opened 2 years ago

mayer79 commented 2 years ago

Currently, "early stopping" monitors validation loss and stops after some unsucessful rounds. This is often used together with gridsearchCV to select a best model. Sometimes, the best performing model shows quite some overfit and one might prefer a model with slightly worse performance but less overfit, depending on the situation.

To actively control for overfit, I would love to see a modification of early stopping. It would stop the booster if after a couple rounds, the validation score is more than "overfit_tolerance" worse than the training score.

It could be used e.g. like this

callbacks=[lgb.early_stopping(20, overfit_tolerance=1.1)]

This would stop the boosting process if after 20 rounds, either the performance stopped improving or the ratio of validation to train performance became >1.1.

StrikerRUS commented 2 years ago

@mayer79 Thanks a lot for your feature request!

Is this the same as recently implemented for Python-package #4580 ?

mayer79 commented 2 years ago

@StrikerRUS: It is not the same, but is indeed not completely unrelated. Both ideas fight overfit. #4580 seems easier to implement, but trickier to apply in practice (a good value for min_delta heavily depends on the choice of the metric, the learning rate, and other regularization parameters).

The idea in this post uses the ratio of train and valid performance to decide whether overfitting is getting too strong. I tried to draw it in the favourite data science tool "Excel" ;).

image

As a user, I can simply wish: "I don't want to have more than x% overfit on my chosen metric(s)".

We don't need to start with this one, but maybe a logical order could look like this:

  1. Make cb.xyz() functions user visible in the R package and switch to a nice callback interface as in Python.
  2. Add #4580 in R, using the early stopping callback
  3. Add attribute train_score to the R6 Booster object (for lgb.train(), lgb.cv(), and lightgbm())
  4. Implement the overfit prevention idea of this thread in both R and Python

I will of course help with the changes, but I am not sure if bullet point 1 is on the roadmap or not?

StrikerRUS commented 2 years ago

@mayer79 Ah, I got the difference now, thanks for the detailed explanation with example!

This idea looks good to me. Is there something similar in other Python/R packages we can check as a reference?

but I am not sure if bullet point 1 is on the roadmap or not?

It is: #2479. I guess we can start from this point.

jameslamb commented 2 years ago

I'd welcome a contribution for #2479 and would be happy to review it, @mayer79 , if you'd like to attempt it.

mayer79 commented 2 years ago

Sounds like a plan!