secondmind-labs / trieste

A Bayesian optimization toolbox built on TensorFlow
Apache License 2.0
219 stars 42 forks source link

Initial point samplers #808

Closed uri-granta closed 8 months ago

uri-granta commented 8 months ago

Related issue(s)/PRs:

Summary

Add support for custom samplers for generating the optimization initial point candidates. This solves two problems:

  1. It allows including pre-computed points among the candidates
  2. It allows batching the random sampling to avoid running out of memory for high dimensional problems.

Fully backwards compatible: yes

Given how often generate_continuous_optimizer is used (and often with keyword arguments) I think we should make an extra effort to maintain backwards compatibility here. The current approach is simply to expand num_initial_samples so it can take either an int or a sampler. An alternate approach would be to add an additional optional initial_point_sampler parameter, but the downside of that is that it wouldn't be possible to catch cases where people accidentally pass in both a num_initial_samples and a initial_sampler.

PR checklist

hstojic commented 8 months ago

this can work as a short term solution, but we would want to make this more flexible - we could pass generators of initial samples to the optimizer, and provide a nice function/class for random sample generators - but this would then allow us to pass other types of generators when needed, e.g. qp generated by some other optimization process which would provide better starting points

uri-granta commented 8 months ago

this can work as a short term solution, but we would want to make this more flexible - we could pass generators of initial samples to the optimizer, and provide a nice function/class for random sample generators - but this would then allow us to pass other types of generators when needed, e.g. qp generated by some other optimization process which would provide better starting points

Is generators definitely the way to go here? Space.sample doesn't currently support this, and I worry that a generator giving 10,000 samples one at a time would be much less efficient than a single call to sample(10_000).

Another approach would be to provide an additional optional argument initial_sampler: Callable[[int], TensorType] that lets you specify a function that returns a given number of initial sample points. The default behaviour would be equivalent to initial_sampler = space.sample but we could provide a helper function so people could write something like initial_sampler = select_from_samples(generator) if they want to. This would work nicely with split_initial_samples, and (more importantly) not break the current generate_continuous_optimizer API.

hstojic commented 8 months ago

this can work as a short term solution, but we would want to make this more flexible - we could pass generators of initial samples to the optimizer, and provide a nice function/class for random sample generators - but this would then allow us to pass other types of generators when needed, e.g. qp generated by some other optimization process which would provide better starting points

Is generators definitely the way to go here? Space.sample doesn't currently support this, and I worry that a generator giving 10,000 samples one at a time would be much less efficient than a single call to sample(10_000).

Another approach would be to provide an additional optional argument initial_sampler: Callable[[int], TensorType] that lets you specify a function that returns a given number of initial sample points. The default behaviour would be equivalent to initial_sampler = space.sample but we could provide a helper function so people could write something like initial_sampler = select_from_samples(generator) if they want to. This would work nicely with split_initial_samples, and (more importantly) not break the current generate_continuous_optimizer API.

it doesn't have to be generators, that's just what seemed like what could allow us to pass a variety of initial samples, but initial_sampler could do the trick - though, it would perhaps need to be a list of samplers, e.g. we would want to have some random samples and some pre-optimised initial points

@vpicheny any thoughts?

uri-granta commented 8 months ago

I've added something that I think is flexible enough but still easy to use. Opinions?