Closed hadisotudeh closed 2 years ago
By default, flaml decides the resampling automatically according to the data size and the time budget. If you would like to enforce a certain resampling strategy, you can set eval_method
to be "holdout" or "cv" for holdout or cross-validation.
For holdout, you can also set:
split_ratio
: the fraction for validation data, 0.1 by default.X_val
, y_val
: a separate validation dataset. When they are passed, the validation metrics will be computed against this given validation dataset. If they are not passed, then a validation dataset will be split from the training data and held out from training during the model search. After the model search, flaml will retrain the model with best configuration on the full training data.
You can setretrain_full
to be False
to skip the final retraining or "budget" to ask flaml to do its best to retrain within the time budget.For cross validation, you can also set n_splits
of the number of folds. By default it is 5.
By default, flaml uses the following method to split the data:
The data split method for classification can be changed into uniform split by setting split_type="uniform"
. For both classification and regression, time-based split can be enforced if the data are sorted by timestamps, by setting split_type="time"
.
Hi,
In the documentation, it says one can set "eval_method": # validation method can be chosen from ['auto', 'holdout', 'cv']
I think it is not clear what "auto" means.
In addition, if one chooses "holdout" or "cv", what is the default values for "split_ratio" and "n_splits".
It is not clear how the cross-validation is done, is it split randomly or based on the input data order?