Cross validation early stopping

microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

MIT License

16.54k stars 3.82k forks source link

Now cross validation early stopping happen based on mean. But seems it's more correct to use minimum (worst) from all folds in iteration, if we want to choose num_iterations based on best_iteration for train model on complete dataset after cv.

https://sites.google.com/site/lauraeppx/xgboost/cross-validation also seems @Laurae2 tell here about it

For example, if on 3 folds cv we got accuracy on iteration 35) 0.9, 0.9, 0, mean = 0.6 29) 0.59, 0.58, 0.57, mean 0.58 - seems iteration 29 is better to choose for num_iterations train model on complete set, but mean on 35 is better.

Is any way to change lgbm.cv from mean to min mode? Or only my own cv with usual lgbm.train calls? Also if make my own - does lgbm.cv have performance benefits than call several time lgbm.train that I can use? It load data 1 time or several?

microsoft / LightGBM

Cross validation early stopping #5683