Closed pplonski closed 3 years ago
We adjust the validation type based on number of cells in the data.
cells = rows * cols
pseudo code to adjust validation:
if cells > 100e6:
validation with split
elif cells > 50e6:
validation with 5-folds
else:
validation with 10-folds
I've changed the approach to set the validation. It is set based on the training time of the Decision Tree algorithm on train/test split 0.9/0.1 of data. If the mode=Compete
then we first train a Decision Tree. Then we assume that other models will be trained in about 5x time of Decision Tree time. And we assume that we would like to have at least 10 models. Based on total_train_limit
and above we compute the rough number of folds. Then if 5 < folds < 15
we used 5-fold CV, if folds > 15
we used 10-fold CV. Otherwise, we continue with a 0.9/0.1 train/test split.
Adjust cross-validation type based on the dataset