mlr-org / mlr3learners

Recommended learners for mlr3
https://mlr3learners.mlr-org.com
GNU Lesser General Public License v3.0
89 stars 14 forks source link

xgboost early stopping #257

Closed sebffischer closed 1 week ago

sebffischer commented 1 year ago

Because I just reviewed a PR in extralearners (https://github.com/mlr-org/mlr3extralearners/pull/251), which added early stopping to the survival xgboost learner, I have two questions regarding the early stopping implementation:

https://github.com/mlr-org/mlr3learners/blob/cb8162a4c85ef7183b6141d1519ba27c7cba5428/R/LearnerClassifXgboost.R#L232

What happens here if the user provides a watchlist? Should we even allow to specify a watchlist as we properly support early stopping now? Stuff can go wrong, e.g. in graphlearners etc..

https://github.com/mlr-org/mlr3learners/blob/cb8162a4c85ef7183b6141d1519ba27c7cba5428/R/LearnerClassifXgboost.R#L86

What happens if we set the early stopping set to "train". Looking at the code (https://github.com/mlr-org/mlr3learners/blob/cb8162a4c85ef7183b6141d1519ba27c7cba5428/R/LearnerClassifXgboost.R#L235) it does not seem like it is covered. In the extralearners PR I suggested to disable the "train" option as nobody would use it anyway.

In summary my suggestions are:

(1) Disable "train" as allowed factor for early_stopping_set (2) disable the watchlist hyperparameter

What do you think @be-marc ?

be-marc commented 1 year ago

What happens if we set the early stopping set to "train".

The train set is added in line 232. XGBoost uses the last set that is added to the watchlist as the early stopping set. If you use early_stopping_set = "test", the train and test set are added to the watchlist. So you can plot this:

Yes, we could remove train. I kept it because it was the default for a few years.

What happens here if the user provides a watchlist?

(1) early_stopping_set = "none": The user can do early stopping on an arbitrary set (provided with watchlist). (2) early_stopping_set != "none": The user can record the performance on an arbitrary set.

sebffischer commented 1 year ago

Thanks for the clarification, maybe we can document that the last set is used for early stopping as otherwise it is not so easy to understand the code.

There is still a problem if the user provides a watchlist and the learner is wrapped in the graphlearner with preprocessing steps. The preprocessing will not be applied to the watchlist and we would do early stopping on out of distribution data which would probably not be what the user wants. This is similar but not identical to the problem that we already have, that the preprocessing steps are currently not applied to the test set. Maybe we should disable the parameter watchlist for that reason?

be-marc commented 1 year ago

I would not disable the watchlist but also document the pipelines issue. I will document both. Thanks!

sebffischer commented 1 week ago

this is properly implemented now