secondmind-labs / trieste

A Bayesian optimization toolbox built on TensorFlow
Apache License 2.0
217 stars 42 forks source link

Data preprocessing / transformations / normalisations #379

Open hstojic opened 2 years ago

hstojic commented 2 years ago

Describe the feature you'd like Support for data pre-processing / transformations / normalisation. Several common normalisations should be offered by trieste but with flexibility to define a custom transformation - this could be part of the model interface where predictions would be automatically transformed back to the non-transformed space, or dataset_builder in TFOptimizer could potentially be used as well.

Is your feature request related to a problem? Please describe. A common transformation is normalisation which is often employed to deal with numerical issues in training the models, to speed up training or improve convergence. Some models offer this out of the box, but GPflow doesn't. In regression problems its easy enough to deal with this as you have to do it once before the training, but in BayesOpt setting it needs to be done inside the optimization loop.

Describe alternatives you've considered Current alternative is to use the Ask-Tell interface and deal with any transformation. This is OK, but suboptimal, as its a common operation for training the models. Its also not a trivial thing to implement, because of active sampling that shifts the distribution of the data.

Notes

saadhamidml commented 2 years ago

An elegant way that I've seen this handled is to create a mixin class that handles the normalisation and denormalisation. This shouldn't be too difficult in Trieste since the models all have a common interface.

To allow for custom transformations perhaps we could optionally allow callables for normalisation and denormalisation to be passed into the mixin's constructor method (which means that the user would pass these in when constructing the model)?

Dealing with the distribution shift should probably be done within the model.update function. To allow for flexibility here, perhaps we can also allow for a callable to be passed in at model construction?

hstojic commented 2 years ago

An elegant way that I've seen this handled is to create a mixin class that handles the normalisation and denormalisation. This shouldn't be too difficult in Trieste since the models all have a common interface.

Could you post a link to that perhaps?

To allow for custom transformations perhaps we could optionally allow callables for normalisation and denormalisation to be passed into the mixin's constructor method (which means that the user would pass these in when constructing the model)?

Might be a good approach.

Dealing with the distribution shift should probably be done within the model.update function. To allow for flexibility here, perhaps we can also allow for a callable to be passed in at model construction?

Simplest thing for now could be not do it iteratively, just at the beginning and then using the transformation parameters later on to transform the new data. We could leave an option to pass a function that would be used in the update call, perhaps offering one way of doing that.

My thinking was to define an interface DataTransformer, with forward and backward transform abstract methods. This object is passed on initializing the model and if passed, optimize, update and predict methods are using it to transform data. We would offer few subclasses, like MinMax and Standard, and users can define their own by subclassing the DataTransformer.

jesnie commented 2 years ago

Could we do it as a model wrapper?

  1. Have a class that implements the Model API.
  2. The class takes another model in its __init__.
  3. When you call predict etc. it transforms the input parameters, then call its child model, then untransforms the output.
saadhamidml commented 2 years ago

I think a model wrapper would be preferable, but we'd want to support updating of the model's hyperparameters depending upon how the training data distribution was shifted. For example, for a model with a constant mean function one might want to update the mean function using something like new_transform(inverse_old_transform(mean_function_value)). This can be done by the model wrapper since it will have access to all the child model's attributes. However, the method of updating the hyperparameters should be left to the user to define (it will be model-dependent). Should we require the user to subclass the model wrapper, or should we allow them to pass a callable into the model wrapper at construction (which gets executed at the end of the update method)?