microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.91k stars 508 forks source link

default values for cv and holdout #302

Closed hadisotudeh closed 2 years ago

hadisotudeh commented 2 years ago

Hi,

In the documentation, it says one can set "eval_method": # validation method can be chosen from ['auto', 'holdout', 'cv']

I think it is not clear what "auto" means.

In addition, if one chooses "holdout" or "cv", what is the default values for "split_ratio" and "n_splits".

It is not clear how the cross-validation is done, is it split randomly or based on the input data order?

sonichi commented 2 years ago

By default, flaml decides the resampling automatically according to the data size and the time budget. If you would like to enforce a certain resampling strategy, you can set eval_method to be "holdout" or "cv" for holdout or cross-validation.

For holdout, you can also set:

For cross validation, you can also set n_splits of the number of folds. By default it is 5.

Data split method

By default, flaml uses the following method to split the data:

The data split method for classification can be changed into uniform split by setting split_type="uniform". For both classification and regression, time-based split can be enforced if the data are sorted by timestamps, by setting split_type="time".