sherpa-ai / sherpa

Hyperparameter optimization that enables researchers to experiment, visualize, and scale quickly.
http://parameter-sherpa.readthedocs.io/
GNU General Public License v3.0
331 stars 53 forks source link

Error when using bayesian optimization #98

Closed joeforan76 closed 4 years ago

joeforan76 commented 4 years ago

When using the bayesian optimization algorithm, I get the following error:

ValueError: `f0` passed has more than 1 dimension.

An extract from the output is

Creating new model for trial 6...

INFO:GP:initializing Y
INFO:GP:initializing inference method
INFO:GP:adding kernel and likelihood as parameters

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-bd36a65e8a36> in <module>
----> 1 run_study()

<ipython-input-8-768b946e96b2> in run_study()
     15                         dashboard_port=8877)
     16     steps = 3
---> 17     for trial in study:
     18         print("-"*100)
     19         print(f"Trial:\t{trial.id}\nSteps:\t{steps}\nParameters:{trial.parameters}\n")

~/host/src/sherpa/sherpa/core.py in __next__(self)
    377         Allows to write `for trial in study:`.
    378         """
--> 379         t = self.get_suggestion()
    380         if isinstance(t, Trial):
    381             return t

~/host/src/sherpa/sherpa/core.py in get_suggestion(self)
    214 
    215         p = self.algorithm.get_suggestion(self.parameters, self.results,
--> 216                                           self.lower_is_better)
    217         if isinstance(p, dict):
    218             self.num_trials += 1

~/host/src/sherpa/sherpa/algorithms/bayesian_optimization.py in get_suggestion(self, parameters, results, lower_is_better)
    109 
    110             domain = self._initialize_domain(parameters)
--> 111             batch = self._generate_bayesopt_batch(X, y, lower_is_better, domain)
    112 
    113             batch_list_of_dicts = self._reverse_to_sherpa_format(batch,

~/host/src/sherpa/sherpa/algorithms/bayesian_optimization.py in _generate_bayesopt_batch(self, X, y, lower_is_better, domain)
    142                                                               exact_feval=False,
    143                                                               model_type=self.model_type)
--> 144         return bo_step.suggest_next_locations()
    145 
    146     def get_best_pred(self, parameters, results, lower_is_better):

/usr/local/python/lib/python3.6/site-packages/GPyOpt/core/bo.py in suggest_next_locations(self, context, pending_X, ignored_X)
     67         self._update_model(self.normalization_type)
     68 
---> 69         suggested_locations = self._compute_next_evaluations(pending_zipped_X = pending_X, ignored_zipped_X = ignored_X)
     70 
     71         return suggested_locations

/usr/local/python/lib/python3.6/site-packages/GPyOpt/core/bo.py in _compute_next_evaluations(self, pending_zipped_X, ignored_zipped_X)
    234 
    235         ### We zip the value in case there are categorical variables
--> 236         return self.space.zip_inputs(self.evaluator.compute_batch(duplicate_manager=duplicate_manager, context_manager= self.acquisition.optimizer.context_manager))
    237 
    238     def _update_model(self, normalization_type='stats'):

/usr/local/python/lib/python3.6/site-packages/GPyOpt/core/evaluators/batch_local_penalization.py in compute_batch(self, duplicate_manager, context_manager)
     35         if self.batch_size >1:
     36             # ---------- Approximate the constants of the the method
---> 37             L = estimate_L(self.acquisition.model.model,self.acquisition.space.get_bounds())
     38             Min = self.acquisition.model.model.Y.min()
     39 

/usr/local/python/lib/python3.6/site-packages/GPyOpt/core/evaluators/batch_local_penalization.py in estimate_L(model, bounds, storehistory)
     64     pred_samples = df(samples,model,0)
     65     x0 = samples[np.argmin(pred_samples)]
---> 66     res = scipy.optimize.minimize(df,x0, method='L-BFGS-B',bounds=bounds, args = (model,x0), options = {'maxiter': 200})
     67     minusL = res.fun[0][0]
     68     L = -minusL

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    616     elif meth == 'l-bfgs-b':
    617         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 618                                 callback=callback, **options)
    619     elif meth == 'tnc':
    620         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, finite_diff_rel_step, **unknown_options)
    306     sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
    307                                   bounds=new_bounds,
--> 308                                   finite_diff_rel_step=finite_diff_rel_step)
    309 
    310     func_and_grad = sf.fun_and_grad

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/optimize.py in _prepare_scalar_function(fun, x0, jac, args, bounds, epsilon, finite_diff_rel_step, hess)
    260     # calculation reduces overall function evaluations.
    261     sf = ScalarFunction(fun, x0, args, grad, hess,
--> 262                         finite_diff_rel_step, bounds, epsilon=epsilon)
    263 
    264     return sf

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in __init__(self, fun, x0, args, grad, hess, finite_diff_rel_step, finite_diff_bounds, epsilon)
     93 
     94         self._update_grad_impl = update_grad
---> 95         self._update_grad()
     96 
     97         # Hessian Evaluation

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in _update_grad(self)
    169     def _update_grad(self):
    170         if not self.g_updated:
--> 171             self._update_grad_impl()
    172             self.g_updated = True
    173 

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in update_grad()
     90                 self.ngev += 1
     91                 self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
---> 92                                            **finite_diff_options)
     93 
     94         self._update_grad_impl = update_grad

/usr/local/python/lib/python3.6/site-packages/scipy/optimize/_numdiff.py in approx_derivative(fun, x0, method, rel_step, abs_step, f0, bounds, sparsity, as_linear_operator, args, kwargs)
    386         f0 = np.atleast_1d(f0)
    387         if f0.ndim > 1:
--> 388             raise ValueError("`f0` passed has more than 1 dimension.")
    389 
    390     if np.any((x0 < lb) | (x0 > ub)):

ValueError: `f0` passed has more than 1 dimension.

The relevant portion of my code is as follows

def run_study():
    parameters = [
        sherpa.Continuous('learning_rate', [1e-6, 1e-1], 'log'),
        sherpa.Choice('activation', ['relu', 'tanh', 'logistic']),
        sherpa.Continuous('alpha', [1e-7, 0.1], 'log'),
        sherpa.Discrete('no_layers', [1, 6]),
        sherpa.Continuous('fan_out', [2.0, 4.0])
    ]

    algorithm = sherpa.algorithms.GPyOpt(max_num_trials=150)
    study = sherpa.Study(parameters=parameters,
                        algorithm=algorithm,
                         lower_is_better=False,
                        dashboard_port=8877)
    steps = 3
    for trial in study:
        print("-"*100)
        print(f"Trial:\t{trial.id}\nSteps:\t{steps}\nParameters:{trial.parameters}\n")
        print(f"Creating new model for trial {trial.id}...\n")

        # Get hyperparameters
        lr = trial.parameters['learning_rate']
        act = trial.parameters['activation']
        alpha = trial.parameters['alpha']
        nl = trial.parameters['no_layers']
        f_out = trial.parameters['fan_out']

        hidden_layers = get_hidden_layers(nl, f_out)
        res = run_trial(data, steps, lr, 1000, alpha, act, hidden_layers)
        study.add_observation(trial=trial, iteration=1, objective=res)
        study.finalize(trial=trial)

Sherpa commit: 4ad600c71f76a06993464e5078188f552111b851 Python version: 3.6.3 scipy version: 1.5.0 GPyOpt Version: 1.2.6

Is this a bug or am I doing something wrong?

TheVidAllMayThe commented 4 years ago

I'm having the same issue running Bayesian optimization on a xgboost regression. I've used parameter-sherpa in the past for such a problem and it worked fine.

TheVidAllMayThe commented 4 years ago

Seems to be working fine with scipy 1.4.1, so I'm guessing there's some sort of new compatibility issue with scipy 1.5.0+ since they made a few changes to the "optimize" module

belerico commented 4 years ago

Yeah. Same problem here: I've downgraded to scipy 1.4.1

sherpa-ai commented 4 years ago

Hey @joeforan76, @TheVidAllMayThe, and @belerico, thanks for raising the issue. Build is currently also failing due to this. It looks like GPyOpt hasn't been updated to handle the new API so I will for now set the scipy version as <=1.4.1 in the sherpa requirements. I've created an issue with GPyOpt to make them aware of this

sherpa-ai commented 4 years ago

Set 'scipy>=1.0.0,<=1.4.1'. Build is passing again: https://github.com/sherpa-ai/sherpa/pull/99 . Closing this for now, let me know if there are related issues again