ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

The hyperopt search algorithm fails with a custom_grid. #213

Closed sungreong closed 3 years ago

sungreong commented 3 years ago

pycaret version : 2.3.1

I want to use a variety of tune methods when using hyperopt in pycaret, but only tune.choice is possible, not others. I hope you check and improve!

from pycaret.datasets import get_data
boston = get_data('boston')
from pycaret.regression import *
exp_name = setup(data = boston,  target = 'medv',silent=True,n_jobs =20)
catboost_model = create_model('catboost')
from ray import tune
import ray
catboost_param_dists = {
    'iterations': tune.choice([500,100,300]),
    # 'reg_lambda': tune.uniform(1, 100),
    # 'bagging_temperature': tune.uniform(0, 100),
    'colsample_bylevel': tune.uniform(0.5, 1.0),
    'random_strength': tune.choice([0,0.1,0.2,1,10]), # tune.uniform(0, 100),
    # 'learning_rate': tune.uniform(1e-3, 1e-1),
    'max_depth' : tune.choice([5,6,7,8,9])
}
tuned_top1 = tune_model(catboost_model,
                        optimize="R2",
                        search_library="tune-sklearn",
                        search_algorithm="hyperopt",
                        choose_better = True ,
                        custom_grid = catboost_param_dists ,
                        early_stopping = "asha",
                        early_stopping_max_iters = 10,
                        return_tuner = False , 
                        n_iter=100)

so i check your code

class CategoricalDistribution(Distribution):
    """
    Categorical distribution.

    Parameters
    ----------
    values: list or other iterable
        Possible values.

    Warnings
    --------
    - `None` is not supported  as a value for ConfigSpace.
    """

    def __init__(self, values):
        self.values = list(values)

    def get_skopt(self):
        import skopt.space

        return skopt.space.Categorical(
            [x if isinstance(x, Hashable) else None for x in self.values],
            transform="identity",
        )

    def get_optuna(self):
        import optuna

        return optuna.distributions.CategoricalDistribution(self.values)

    def get_hyperopt(self, label):
        from hyperopt import hp

        return hp.choice(label, self.values)

    def get_CS(self, label):
        import ConfigSpace.hyperparameters as CSH

        return CSH.CategoricalHyperparameter(
            name=label, choices=[x for x in self.values if isinstance(x, Hashable)]
        )

    def get_tune(self):
        from ray import tune

        return tune.choice(self.values)

    def __repr__(self):
        return f"CategoricalDistribution(values={self.values})"

in now status, pycaret only support "tune.choice"

And it seems only available if a certain range is set.

## CategoricalDistribution
def __init__(self, values):
        self.values = list(values)

So this problem can only be done by setting a certain range with any algorithm I use.

Error msg

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-4e6cccbea2cd> in <module>
     11 }
     12 ray.init(num_cpus=20, num_gpus=2)
---> 13 tuned_top1 = tune_model(catboost_model,
     14                         optimize="R2",
     15                         search_library="tune-sklearn",

/opt/conda/envs/sidep/lib/python3.8/site-packages/pycaret/regression.py in tune_model(estimator, fold, round, n_iter, custom_grid, optimize, custom_scorer, search_library, search_algorithm, early_stopping, early_stopping_max_iters, choose_better, fit_kwargs, groups, return_tuner, verbose, tuner_verbose, **kwargs)
   1082     """
   1083 
-> 1084     return pycaret.internal.tabular.tune_model_supervised(
   1085         estimator=estimator,
   1086         fold=fold,

/opt/conda/envs/sidep/lib/python3.8/site-packages/pycaret/internal/tabular.py in tune_model_supervised(estimator, fold, round, n_iter, custom_grid, optimize, custom_scorer, search_library, search_algorithm, early_stopping, early_stopping_max_iters, choose_better, fit_kwargs, groups, return_tuner, verbose, tuner_verbose, display, **kwargs)
   4141             )
   4142         ):
-> 4143             param_grid = {
   4144                 k: CategoricalDistribution(v) if not isinstance(v, Distribution) else v
   4145                 for k, v in param_grid.items()

/opt/conda/envs/sidep/lib/python3.8/site-packages/pycaret/internal/tabular.py in <dictcomp>(.0)
   4142         ):
   4143             param_grid = {
-> 4144                 k: CategoricalDistribution(v) if not isinstance(v, Distribution) else v
   4145                 for k, v in param_grid.items()
   4146             }

/opt/conda/envs/sidep/lib/python3.8/site-packages/pycaret/internal/distributions.py in __init__(self, values)
    272 
    273     def __init__(self, values):
--> 274         self.values = list(values)
    275 
    276     def get_skopt(self):

TypeError: 'Float' object is not iterable

Do you have a development schedule to improve this problem?

thanks :)

Yard1 commented 3 years ago

Hey @sungreong this is a PyCaret issue, not a tune-sklearn one. When using PyCaret, you should use PyCaret distributions (from pycaret.distributions import *) instead of tune ones.

sungreong commented 3 years ago

Thanks you for your reply. I know it is not tune-sklearn issue. This is pycaret issue. But i follow pycaret docs In the document, they mention like this.

custom_grid: dictionary, default = None To define custom search space for hyperparameters, pass a dictionary with parameter name and values to be iterated. Custom grids must be in a format supported by the defined search_library.

https://pycaret.readthedocs.io/en/latest/api/regression.html

So could you give me the sample code?

Yard1 commented 3 years ago

You will want to do something like this:

from pycaret.datasets import get_data
boston = get_data('boston')
from pycaret.regression import *
from pycaret.distributions import UniformDistribution, CategoricalDistribution

exp_name = setup(data = boston,  target = 'medv',silent=True,n_jobs =20)
catboost_model = create_model('catboost')
catboost_param_dists = {
    'iterations': CategoricalDistribution([500,100,300]),
    # 'reg_lambda': UniformDistribution(1, 100),
    # 'bagging_temperature': UniformDistribution(0, 100),
    'colsample_bylevel': UniformDistribution(0.5, 1.0),
    'random_strength': CategoricalDistribution([0,0.1,0.2,1,10]), # tune.uniform(0, 100),
    # 'learning_rate': UniformDistribution(1e-3, 1e-1),
    'max_depth' : CategoricalDistribution([5,6,7,8,9])
}
tuned_top1 = tune_model(catboost_model,
                        optimize="R2",
                        search_library="tune-sklearn",
                        search_algorithm="hyperopt",
                        choose_better = True ,
                        custom_grid = catboost_param_dists ,
                        early_stopping = "asha",
                        early_stopping_max_iters = 10,
                        return_tuner = False , 
                        n_iter=100)
sungreong commented 3 years ago

Thanks it works !