Closed pavelkomarov closed 6 years ago
I can reproduce this with
In [1]: from skopt.space import Categorical
In [2]: dims = [Categorical(['a', 'b', 'c']), Categorical(['A', 'B', 'C'])]
In [3]: from skopt import Optimizer
In [4]: optimizer = Optimizer(dims, n_initial_points=1, random_state=3)
GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
kernel=1**2 * HammingKernel(0, 0, 0, 0, 0, 0),
n_restarts_optimizer=2, noise='gaussian', normalize_y=True,
optimizer='fmin_l_bfgs_b', random_state=218175338)
In [5]: optimizer.ask()
Out[5]: ['a', 'C']
In [6]: optimizer.tell(_5, 1.)
With purely categorical spaces and the GP regressor we pick a special kernel. So this means there is a bug in this special case handling, which explains why as soon as you add an extra Real
dimension the problem is solved.
Here is my example:
Running with
acq_optimizer='lbfgs'
I was seeingValueError: The regressor <class 'skopt.learning.gaussian_process.gpr.GaussianProcessRegressor'> should run with acq_optimizer='sampling'.
But by tracing the Optimizer's_check_arguments()
code I was able to figure out that my base_estimator must simply not have gradients in this Categoricals-only case. Changing toacq_optimizer='auto'
solves that problem.But now I see a new
TypeError: %d format: a number is required
thrown deep inside skopt/learning/gaussian_process/kernels.py. If I print the transformed space before it gets passed to the Gaussian process, I see that it isn't really transformed at all: It still has strings in it!Adding a numerical dimension like
alpha: Real(0.0001, 0.001, prior='log-uniform')
causes the construction and the.tell()
to succeed because the transformed space is then purely numerical.So the way purely-categorical spaces are transformed should be updated.
Or there is an other possibility: Does it even make sense to try to do Bayesian optimization on a purely categorical space like this? Say I try setting (A,B), setting (A,C), setting (X,Z), and setting (Y,Z). For the sake of argument say (A,B) does better than (A,C) and (X,Z) does better than (Y,Z). Can we then suppose (X,B) will do better than (Y,C)? Who is to say (Y,C) isn't a super-combination or that the gains we seem to see from varying the first parameter to X or the second to B are unrelated? It seems potentially dangerous to reason this way, so perhaps the intent is that no Bayesian optimization should be possible in purely Categorical spaces. If this is the case, an error should be thrown early to say this. Furthermore, if this is correct, then how are point-values in Categorical dimensions decided during optimization? Are the numerical parameters optimized while Categoricals are selected at random?
It seems the previous paragraph should be wrong: If I try many examples with some parameter set to some value and observe a pattern of poor performance, I can update my beliefs to say "This is a bad setting". It shouldn't matter whether a parameter is Categorical or not; Bayesian optimization should be equally powerful in all cases.