[Feature Request] Add default model parameters to tuning trials

ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.

https://docs.ray.io/en/master/tune/api_docs/sklearn.html

Apache License 2.0

465 stars 52 forks source link

[Feature Request] Add default model parameters to tuning trials #73

Closed rohan-gt closed 3 years ago

rohan-gt commented 4 years ago

While testing on multiple datasets I've observed that if n_iter is low, the tuned model has lower performance for both train and test sets compared to the model using default parameters. Is there a way to ensure that the default parameters of a model are tried out as the first trial and it starts optimizing from there? If there's no improvement, there's no need to run all the trials

richardliaw commented 4 years ago

Do you see the same behavior on RandomizedSearchCV?

rohan-gt commented 4 years ago

No. Only on Bayesian and friends

richardliaw commented 4 years ago

@rohan-gt we can prioritize this in the upcoming week. thanks!

rohan-gt commented 3 years ago

One possible solution could be if the user sets 10 trials, the first one could be using the algorithm's default parameters and the rest 9 could be sampled randomly and progress from there

Yard1 commented 3 years ago

Or the defaults could just be ran first, always - that way we guarantee the tuned model will not be worse than the one that went in.

richardliaw commented 3 years ago

What you want to do here is to implement something like Scikit-Optimize, where you can specify x0 and y0 upfront:

https://scikit-optimize.github.io/stable/modules/generated/skopt.optimizer.gbrt_minimize.html#skopt.optimizer.gbrt_minimize

Yard1 commented 3 years ago

This probably needs to be handled outside the main optimization loop as it would be hard to just drop in the current model params into distributions (though it wouldn't be an issue with random/grid search).