Open segatrade opened 1 year ago
I don't think so. It's possible that you have a fold whose error is monotonically decreasing but still higher than other folds whereas other folds do have their minimums in early rounds . Then choosing the worst error will always set the best iter to the total number of iters.
Now cross validation early stopping happen based on
mean
. But seems it's more correct to use minimum (worst) from all folds in iteration, if we want to choosenum_iterations
based onbest_iteration
for train model on complete dataset after cv.https://sites.google.com/site/lauraeppx/xgboost/cross-validation also seems @Laurae2 tell here about it
For example, if on 3 folds cv we got accuracy on iteration 35) 0.9, 0.9, 0, mean = 0.6 29) 0.59, 0.58, 0.57, mean 0.58 - seems iteration 29 is better to choose for
num_iterations
train model on complete set, but mean on 35 is better.Is any way to change
lgbm.cv
from mean to min mode? Or only my own cv with usuallgbm.train
calls? Also if make my own - doeslgbm.cv
have performance benefits than call several timelgbm.train
that I can use? It load data 1 time or several?