scikit-optimize / scikit-optimize

Sequential model-based optimization with a `scipy.optimize` interface
https://scikit-optimize.github.io
BSD 3-Clause "New" or "Revised" License
2.74k stars 547 forks source link

TypeError: %d format: a number is required #615

Closed pavelkomarov closed 6 years ago

pavelkomarov commented 6 years ago

Here is my example:

    from skopt import Optimizer
    from skopt.utils import dimensions_aslist
    from skopt.space import Integer, Categorical, Real

    NN = {
        'activation': Categorical(['identity', 'logistic', 'tanh', 'relu']),
        'solver': Categorical(['adam', 'sgd', 'lbfgs']),
        'learning_rate': Categorical(['constant', 'invscaling', 'adaptive']),
        'hidden_layer_sizes': Categorical([(100,100)])
    }

    listified_space = dimensions_aslist(NN)
    acq_optimizer_kwargs = {'n_points': 20, 'n_restarts_optimizer': 5, 'n_jobs': 3}
    acq_func_kwargs = {'xi': 0.01, 'kappa': 1.96}

    optimizer = Optimizer(listified_space, base_estimator='gp', n_initial_points=10,
        acq_func='EI', acq_optimizer='auto', random_state=None,
        acq_optimizer_kwargs=acq_optimizer_kwargs, acq_func_kwargs=acq_fun_kwargs)

    rand_xs = []
    for n in range(10):
        rand_xs.append(optimizer.ask())

    rand_ys = [1,2,3,4,5,6,7,8,9,10]

    print rand_xs
    print rand_ys

    optimizer.tell(rand_xs, rand_ys)

Running with acq_optimizer='lbfgs' I was seeing ValueError: The regressor <class 'skopt.learning.gaussian_process.gpr.GaussianProcessRegressor'> should run with acq_optimizer='sampling'. But by tracing the Optimizer's _check_arguments() code I was able to figure out that my base_estimator must simply not have gradients in this Categoricals-only case. Changing to acq_optimizer='auto' solves that problem.

But now I see a new TypeError: %d format: a number is required thrown deep inside skopt/learning/gaussian_process/kernels.py. If I print the transformed space before it gets passed to the Gaussian process, I see that it isn't really transformed at all: It still has strings in it!

Adding a numerical dimension like alpha: Real(0.0001, 0.001, prior='log-uniform') causes the construction and the .tell() to succeed because the transformed space is then purely numerical.

So the way purely-categorical spaces are transformed should be updated.

Or there is an other possibility: Does it even make sense to try to do Bayesian optimization on a purely categorical space like this? Say I try setting (A,B), setting (A,C), setting (X,Z), and setting (Y,Z). For the sake of argument say (A,B) does better than (A,C) and (X,Z) does better than (Y,Z). Can we then suppose (X,B) will do better than (Y,C)? Who is to say (Y,C) isn't a super-combination or that the gains we seem to see from varying the first parameter to X or the second to B are unrelated? It seems potentially dangerous to reason this way, so perhaps the intent is that no Bayesian optimization should be possible in purely Categorical spaces. If this is the case, an error should be thrown early to say this. Furthermore, if this is correct, then how are point-values in Categorical dimensions decided during optimization? Are the numerical parameters optimized while Categoricals are selected at random?

It seems the previous paragraph should be wrong: If I try many examples with some parameter set to some value and observe a pattern of poor performance, I can update my beliefs to say "This is a bad setting". It shouldn't matter whether a parameter is Categorical or not; Bayesian optimization should be equally powerful in all cases.

betatim commented 6 years ago

I can reproduce this with

In [1]: from skopt.space import Categorical

In [2]: dims = [Categorical(['a', 'b', 'c']), Categorical(['A', 'B', 'C'])]

In [3]: from skopt import Optimizer

In [4]: optimizer = Optimizer(dims, n_initial_points=1, random_state=3)
GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
             kernel=1**2 * HammingKernel(0, 0, 0, 0, 0, 0),
             n_restarts_optimizer=2, noise='gaussian', normalize_y=True,
             optimizer='fmin_l_bfgs_b', random_state=218175338)

In [5]: optimizer.ask()
Out[5]: ['a', 'C']

In [6]: optimizer.tell(_5, 1.)

With purely categorical spaces and the GP regressor we pick a special kernel. So this means there is a bug in this special case handling, which explains why as soon as you add an extra Real dimension the problem is solved.