sherpa-ai / sherpa

Hyperparameter optimization that enables researchers to experiment, visualize, and scale quickly.
http://parameter-sherpa.readthedocs.io/
GNU General Public License v3.0
333 stars 54 forks source link

PBT without MongoDB: Trial parameters are not changed from generation to generation #65

Closed martsalz closed 5 years ago

martsalz commented 5 years ago

Since MongoDB does not work correctly (#64) I have rewritten the mnistcnnpbt example (https://github.com/sherpa-ai/sherpa/tree/master/examples/parallel-examples/mnistcnnpbt) as follows:

(I have significantly reduced the training and test data set, so that the training on the CPU is performed faster)

https://github.com/martsalz/sherpa-PBT/blob/master/MNIST%20PBT%20Without%20MongoDB.ipynb

I expected through the following line in the code the learning rate is changed from generation to generation - but this is not the case: K.set_value(model.optimizer.lr, trial.parameters['lr'])

As I understand the code, the following generations will load the already trained models and modify their parameters (the new modified parameters will be provided by the sherpa.algorithms.PopulationBasedTraining). How should the code be changed, for example, to change the batch_size or the dropout_rate in one generation?

martsalz commented 5 years ago

In the 11. trial for example, the model from the 6. trial is loaded and modified by the algorithm. However, the learning rate is the same - this is the case for all trials.

image

LarsHH commented 5 years ago

Hi Martin, See my comment on the other issue. So we should only expect the last two trials (in a generataion of 10) to have changed learning rates. However, you're right that K.set_value(model.optimizer.lr, trial.parameters['lr']) actually doesn't work. Keras will happily accept this command but internally it won't actually change the learning rate of the training. Changing the learning rate of a compiled model in Keras is actually non-trivial. Same for dropout. Batch size should be easy. Sherpa should pass a modified batch_size (again only for the bottom 20% of the population) and so long Keras reads the training batchsize from trial.parameters it should be correct.