Closed RobertArbon closed 7 years ago
IIRC I originally wrote it to use the 68% CI upper bound as the acquisition function. I'm sure why I chose that, but I assume it was related to the fact that it was easy to implement. I'm not sure about the development that's gone on more recently since others have been maintaining the code.
Sent from my iPhone
On Jun 5, 2017, at 6:13 PM, RobertArbon notifications@github.com wrote:
Just tried this on the current development release.
strategy: name: gp params: seeds: 5
search_space: C: min: 0.1 max: 10 warp: log type: float
gamma: min: 1e-5 max: 1 warp: log type: float
cv: 5
dataset_loader: name: sklearn_dataset params: method: load_digits
trials: uri: sqlite:///osprey-trials.db
random_seed: 42 After running for 20 iterations it:
sometimes crashes (numpy.linalg.linalg.LinAlgError: not positive definite, even with jitter.) When it doesn't the results do seem to be better than Grid search, but they don't seem to converge in a smooth way, which is what you'd expect from a Gaussian Process optimization: Grid Search:
Best Current Model = 0.969393 +- 0.017132
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) C 10.0 gamma 0.00017782794100389227 Gaussian Processes:
Best Current Model = 0.972732 +- 0.015372
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) C 7.969454818643936 gamma 0.0007459343285726547
I have some questions regarding your implementation:
The acquisition function seems odd - it's not the expected improvement a la Snoek. Why add the mean to the variance? The bounds used to define the maximimisation of the acquisition function don't reflect the bounds of the variables - they're all 0 - 1. See my other question To investigate these issues I looked at how this code implemented optimization. It allows you to used a the sci-py minimize() function AND a random sampling algorithm to maximize the acquisition function. After having a play I found the minimize function not to converge on a correct answer and didn't show increasing/converging score values. Using the random sampling method gave a converging answer. Spearmint uses a MCMC algo as well.
I've implemented the expected improvement a la Snoek, with a Matern52 kernel and a random sampling maximisation algorithm in Osprey in a new branch. I get the following results.
Best Current Model = 0.974958 +- 0.013960
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) C 10.0 gamma 0.0004987205335704034
They seem marginally better but maybe it's just this case.
Thanks for you time.
Rob
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
The acquisition function seems odd - it's not the expected improvement a la Snoek. Why add the mean to the variance?
I think the idea of adding the variance to the mean in the acquisition function was to try to search for the largest possible gain in performance and reduce uncertainty in hyperparameter space. We weren't necessarily concerned with smooth convergence, as sampling similar values in variable-space might lead to redundancy in integer-space. There also wasn't necessarily any literature we were following.
The bounds used to define the maximimisation of the acquisition function don't reflect the bounds of the variables - they're all 0 - 1.
Not sure I follow here. The optimization problem solved by scipy.optimize
should have bounds [0, 1] for all input variables.
See my other question
It's been a while, but IIRC I was following a tutorial from the GPy website (don't remember which). The idea behind the Fixed
kernel was to incorporate the uncertainties in the score inferred from cross-validation into the GP model. It's probably unnecessary, and I'm open to changing this if it improves performance.
RE: Acquisition function - OK. I got confused because the website said Expected Improvement so I took that literally.
RE: Optimise: OK I didn't look properly in the searchspace
class for the normalisation - my apologies (it was late this side of the Atlantic).
RE: Kernels - OK. I'm comparing them now, if there's anything interesting to say I'll make a PR.
Thanks for your time. This code (along with MSMBuilder & OpenMM) is a delight to work with so keep up the good work.
Dev. notes: It would be awesome to add expected improvement and make the kernels more configurable.
Thanks for your time. This code (along with MSMBuilder & OpenMM) is a delight to work with so keep up the good work.
Glad to hear it! Thank you for being an active contributor to our software!
I'm writing configurable Kernels and Expected Improvement at the minute. Will make a PR in next couple of days.
The Kernels can be added as a list of GPy entry points.
I'm writing configurable Kernels and Expected Improvement at the minute. Will make a PR in next couple of days!
One thing to note though is that I'm interested in switching to GPFlow for this calculation, as it's more actively maintained.
OK, cool. I'm writing it for my own benefit but it looks like only minor changes would be needed to change to GPFlow as the interfaces look similar.
done in #229
Just tried this on the current development release.
After running for 20 iterations it:
numpy.linalg.linalg.LinAlgError: not positive definite, even with jitter.
)Grid Search:
Gaussian Processes:
I have some questions regarding your implementation:
To investigate these issues I looked at how this code implemented optimization. It allows you to used a the sci-py
minimize()
function AND a random sampling algorithm to maximize the acquisition function. After having a play I found theminimize
function not to converge on a correct answer and didn't show increasing/converging score values. Using the random sampling method gave a converging answer. Spearmint uses a MCMC algo as well.I've implemented the expected improvement a la Snoek, with a Matern52 kernel and a random sampling maximisation algorithm in Osprey in a new branch. I get the following results.
They seem marginally better but maybe it's just this case.
Thanks for you time.
Rob