default values for cv and holdout

By default, flaml decides the resampling automatically according to the data size and the time budget. If you would like to enforce a certain resampling strategy, you can set eval_method to be "holdout" or "cv" for holdout or cross-validation.

For holdout, you can also set:

split_ratio: the fraction for validation data, 0.1 by default.
X_val, y_val: a separate validation dataset. When they are passed, the validation metrics will be computed against this given validation dataset. If they are not passed, then a validation dataset will be split from the training data and held out from training during the model search. After the model search, flaml will retrain the model with best configuration on the full training data. You can setretrain_full to be False to skip the final retraining or "budget" to ask flaml to do its best to retrain within the time budget.

For cross validation, you can also set n_splits of the number of folds. By default it is 5.

Data split method

By default, flaml uses the following method to split the data:

stratified split for classification;
uniform split for regression;
time-based split for time series forecasting;
group-based split for learning to rank.

The data split method for classification can be changed into uniform split by setting split_type="uniform". For both classification and regression, time-based split can be enforced if the data are sorted by timestamps, by setting split_type="time".

microsoft / FLAML

default values for cv and holdout #302

Data split method