keras-team / keras-tuner

A Hyperparameter Tuning Library for Keras
https://keras.io/keras_tuner/
Apache License 2.0
2.86k stars 396 forks source link

How to tune the number of epochs and batch_size? #122

Closed ogreyesp closed 5 years ago

ogreyesp commented 5 years ago

Hi,

How I can tune the number of epochs and batch size?

The provided examples always assume fixed values for these two hyperparameters.

omalleyt12 commented 5 years ago

@ogreyesp Thanks for the issue!

This comment is updated by @haifeng-jin because it was out-of-date. Following is the latest recommended way of doing it:

This is a barebone code for tuning batch size. The *args and **kwargs are the ones you passed from tuner.search().

class MyHyperModel(kt.HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        model.add(layers.Flatten())
        model.add(
            layers.Dense(
                units=hp.Int("units", min_value=32, max_value=512, step=32),
                activation="relu",
            )
        )
        model.add(layers.Dense(10, activation="softmax"))
        model.compile(
            optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            batch_size=hp.Choice("batch_size", [16, 32]),
            **kwargs,
        )

tuner = kt.RandomSearch(
    MyHyperModel(),
    objective="val_accuracy",
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="tune_hypermodel",
)

For epochs specifically, I'd alternatively recommend looking at using early stopping during training via passing in the tf.keras.callbacks.EarlyStopping callback if it's applicable to your use case. This can be configured to stop your training as soon as the validation loss stops improving. You can pass Keras callbacks like this to search:

# Will stop training if the "val_loss" hasn't improved in 3 epochs.
tuner.search(x, y, epochs=30, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

For n-fold cross validation, you can also just do it in HyperModel.fit() and return the result as a dictionary like {"val_accuracy": 0.3}, where the key is the name of the objective. Please follow this guide for more details.

ogreyesp commented 5 years ago

Thanks @omalleyt12.

Your response is very helpful.

ogreyesp commented 5 years ago

This project is very important and useful for me. However, the lack of documentation and tutorials is hampering its use.

For example, how can I determine the best subset of hyperparameters by conducting a cross validation?

omalleyt12 commented 5 years ago

This comment is updated by @haifeng-jin because it was out-of-date. Please use the code snippets above instead.

omalleyt12 commented 5 years ago

Please see pending PR here with a tutorial: https://github.com/keras-team/keras-tuner/pull/136

pickfire commented 4 years ago

Is it possible to do tuning without creating a class?

VincBar commented 4 years ago

Thanks for the explanation on batch size. However, when I retrieve the parameters of the best model by tuner.get_best_hyperparameters()[0] and take a look at the values through .get_config()["values"] the batch_size is not listed there. How can I retrieve the hyperparameter "batch size" when doing the search in the way described here.

tolandwehr commented 4 years ago

@omalleyt12 @VincBar Was this issue resolved? Using KerasTuner for epoch and batch_size right now, too. Not very keen to have invisible results after 10hrs of running.

VincBar commented 4 years ago

@tolandwehr hey, I dont know if the direct way is solved, but I went around by inculding the batchsize hyperparam in the hypermodel and save it to self.batch_size ( or in my case actually a dictionary with some other stuff) and define a fit function in my hypermodel that then takes this (and whatever else the fit might need).

tolandwehr commented 4 years ago

@VincBar Sounds interesting. Could you give a code, if still available ^^'?

tolandwehr commented 4 years ago

@omalleyt12

other issue: got a NaN/Inf error after some hours of iterations... which is strange, cause I double checked the dataset with

.isnull().sum().sum()

and there were no NaNs

ValueError                                Traceback (most recent call last)
<ipython-input-666-7713a18234fe> in <module>
----> 1 tuner.search(X_train, y_train, epochs=40, validation_split=0.1, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

~\Anaconda3\envs\Tensorflow\lib\site-packages\kerastuner\engine\base_tuner.py in search(self, *fit_args, **fit_kwargs)
    118         self.on_search_begin()
    119         while True:
--> 120             trial = self.oracle.create_trial(self.tuner_id)
    121             if trial.status == trial_module.TrialStatus.STOPPED:
    122                 # Oracle triggered exit.

~\Anaconda3\envs\Tensorflow\lib\site-packages\kerastuner\engine\oracle.py in create_trial(self, tuner_id)
    147             values = None
    148         else:
--> 149             response = self._populate_space(trial_id)
    150             status = response['status']
    151             values = response['values'] if 'values' in response else None

~\Anaconda3\envs\Tensorflow\lib\site-packages\kerastuner\tuners\bayesian.py in _populate_space(self, trial_id)
    101         x, y = self._vectorize_trials()
    102         try:
--> 103             self.gpr.fit(x, y)
    104         except exceptions.ConvergenceWarning:
    105             # If convergence of the GPR fails, create a random trial.

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in fit(self, X, y)
    232             optima = [(self._constrained_optimization(obj_func,
    233                                                       self.kernel_.theta,
--> 234                                                       self.kernel_.bounds))]
    235 
    236             # Additional runs are performed from log-uniform chosen initial

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in _constrained_optimization(self, obj_func, initial_theta, bounds)
    501             opt_res = scipy.optimize.minimize(
    502                 obj_func, initial_theta, method="L-BFGS-B", jac=True,
--> 503                 bounds=bounds)
    504             _check_optimize_result("lbfgs", opt_res)
    505             theta_opt, func_min = opt_res.x, opt_res.fun

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    608     elif meth == 'l-bfgs-b':
    609         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 610                                 callback=callback, **options)
    611     elif meth == 'tnc':
    612         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    343             # until the completion of the current minimization iteration.
    344             # Overwrite f and g:
--> 345             f, g = func_and_grad(x)
    346         elif task_str.startswith(b'NEW_X'):
    347             # new iteration

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\lbfgsb.py in func_and_grad(x)
    293     else:
    294         def func_and_grad(x):
--> 295             f = fun(x, *args)
    296             g = jac(x, *args)
    297             return f, g

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
    325     def function_wrapper(*wrapper_args):
    326         ncalls[0] += 1
--> 327         return function(*(wrapper_args + args))
    328 
    329     return ncalls, function_wrapper

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\optimize.py in __call__(self, x, *args)
     63     def __call__(self, x, *args):
     64         self.x = numpy.asarray(x).copy()
---> 65         fg = self.fun(x, *args)
     66         self.jac = fg[1]
     67         return fg[0]

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in obj_func(theta, eval_gradient)
    223                 if eval_gradient:
    224                     lml, grad = self.log_marginal_likelihood(
--> 225                         theta, eval_gradient=True, clone_kernel=False)
    226                     return -lml, -grad
    227                 else:

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in log_marginal_likelihood(self, theta, eval_gradient, clone_kernel)
    474             y_train = y_train[:, np.newaxis]
    475 
--> 476         alpha = cho_solve((L, True), y_train)  # Line 3
    477 
    478         # Compute log-likelihood (compare line 7)

~\AppData\Roaming\Python\Python36\site-packages\scipy\linalg\decomp_cholesky.py in cho_solve(c_and_lower, b, overwrite_b, check_finite)
    194     (c, lower) = c_and_lower
    195     if check_finite:
--> 196         b1 = asarray_chkfinite(b)
    197         c = asarray_chkfinite(c)
    198     else:

~\Anaconda3\envs\Tensorflow\lib\site-packages\numpy\lib\function_base.py in asarray_chkfinite(a, dtype, order)
    497     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    498         raise ValueError(
--> 499             "array must not contain infs or NaNs")
    500     return a
    501 

ValueError: array must not contain infs or NaNs
saranyaprakash2012 commented 3 years ago

I would like to use Bayesian optimization tuner to tune epochs and batch size for a BLSTM model. My data is passed in using a custom data generator, which takes batch size as input. How do I use the Keras tuner in this case?

LIKHITA12 commented 3 years ago

@ogreyesp Thanks for the issue!

This can be done by subclassing the Tuner class you are using and overriding run_trial. (Note that Hyperband sets the epochs to train for via its own logic, so if you're using Hyperband you shouldn't tune the epochs). Here's an example with kt.tuners.BayesianOptimization:

class MyTuner(kerastuner.tuners.BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    super(MyTuner, self).run_trial(trial, *args, **kwargs)

# Uses same arguments as the BayesianOptimization Tuner.
tuner = MyTuner(...)
# Don't pass epochs or batch_size here, let the Tuner tune them.
tuner.search(...)

For epochs specifically, I'd alternatively recommend looking at using early stopping during training via passing in the tf.keras.callbacks.EarlyStopping callback if it's applicable to your use case. This can be configured to stop your training as soon as the validation loss stops improving. You can pass Keras callbacks like this to search:

# Will stop training if the "val_loss" hasn't improved in 3 epochs.
tuner.search(x, y, epochs=30, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

hello @ogreyesp, I have implemented this in the Hyperband keras tuner. I have a doubt, for the first trial, why the batch_size is not included and from the second trial onwards. Why is it so? Is there any way to include batch_size in first trial itself? Please let me know.

JoepC commented 3 years ago

I used the following code to optimise the number of epochs and batch size:

class MyTuner(kerastuner.tuners.BayesianOptimization): def run_trial(self, trial, *args, **kwargs):

You can add additional HyperParameters for preprocessing and custom training loops

via overriding run_trial

kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32) kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30) super(MyTuner, self).run_trial(trial, *args, **kwargs)

Now I want to save the number of epochs and batch size for the best trial that the tuner found.

I tried using the following code suggested by @fredshu, but I could not get it working:

values['batch_size'] = best_trial.batch_size

How is 'best_trial' defined? I use best_model = tuner.get_best_models()[0] to get the best model to make predictions afterwards, if I replace best_trial with best_model it does not work. I used with redirect_stdout(f): tuner.results_summary() to save the full summary to a text file but now I only want to have the number of epochs and batch size of the best trial.

So how do I save the number of epochs and batch size of the best trial to seperate variables? If it is possible I would also like to save the other optimised hyperparameters.

sukrit2018 commented 3 years ago

I am new to Keras and Tensorflow. I want to simultaneously explore the number of epochs and the CV for my project. Can you please help me to write the custom Tuner?

21kc-caracol commented 3 years ago

@saranyaprakash2012 Did you manage to use a Keras training generator with a Keras Tuner that tunes the batch_size?

Can anyone give a code snippet that does that?

The above example @omalleyt12 gave didn't change the actual batch size that the training generator (ImageDataGenerator) took.

I mean that in the log the Keras Tuner shows it printed as if the batch size was taken into consideration, but the actual log also showed that the training generator ignored the Keras Tuner batch_size and just took a predefined value...

Examples: The actual batch size was 128 on a debug dataset of 150~ samples, so we had 2 batches: `2/2 [==============================]

2/2 [==============================]

but in the hyper parameters of the tuner it showed

`Hyperparameter |Value |Best Value So Far

learning_rate |0.5 |0.5

decay |0.01 |0.01

momentum |0 |0

batch_size |2 |1

` (I only had 1,2 as the batch size options inside the tuner)

saranyaprakash2012 commented 3 years ago

Try something like this :

create a model class

def create_hypermodel(hp): learning_rate=0.0001 K.clear_session() inputs_pose_gaze = Input(POSE_GAZE_INPUT_SHAPE) blstm1_pose_gaze = Bidirectional(LSTM(200,return_sequences=True,recurrent_dropout =0,activation='tanh'))(inputs_pose_gaze)
max_pooled_poze= GlobalMaxPooling1D()(blstm1_pose_gaze) output = Dense(1,activation='sigmoid')(max_pooled_poze) model = Model(inputs=[inputs_pose_gaze], outputs=output)
model.compile(optimizer=optimizers.Adam(learning_rate), loss=losses.BinaryCrossentropy(), ) print(model.summary())

return model

Tuner class

class MyTuner2(BayesianOptimization): def run_trial(self, trial, *args, **kwargs):

You can add additional HyperParameters for preprocessing and custom training loops

        # via overriding `run_trial`
        hp = trial.hyperparameters

        kwargs['batch_size'] = hp.Int('batch_size', 4, 16, step=4)
        # kwargs['val_batch_size'] = hp.Int('val_batch_size', 1, 4, step=1)
        kwargs['epochs'] = hp.Int('epochs', 10, 25,step=5)

        train_data_gen= video_batch_generator(train_upsample_files,hp.Int('batch_size',4,16,step=4))
        print(f"batch_size:{hp.Int('batch_size',4,16,step=4)}")
        val_data_gen= video_batch_generator(val_files,hp.Int('val_batch_size', 1, 4, step=1))

        steps_per_epoch= math.floor(len(train_upsample_files)/hp.Int('batch_size',4,16,step=4)) 
        val_steps_per_epoch= math.floor(len(val_files)/VAL_BATCH_SIZE)
        early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='auc', 
        verbose=1,
        patience=2,
        mode='max',
        restore_best_weights=True)

        model = self.hypermodel.build(hp)
        model.fit(train_data_gen,steps_per_epoch=steps_per_epoch,epochs = hp.Int('epochs', 5, 20,step=5),callbacks=[early_stopping])
        val_metrics = model.evaluate(val_data_gen,steps =val_steps_per_epoch,return_dict=True)
        print(f"Evaluation val_metrics :{val_metrics}")
        self.oracle.update_trial(
          trial.trial_id, {'val_auc': val_metrics['auc']})
        self.save_model(trial.trial_id, model)

# Uses same arguments as the BayesianOptimization Tuner.
tuner = MyTuner2(create_hypermodel,
    objective=Objective("val_auc", direction="max"),
    max_trials=6,
    executions_per_trial=1,
    directory=os.path.normpath('keras_tuning_blstm_video'),
    project_name='kerastuner_bayesian_lstm_video',overwrite=True)

# Don't pass epochs or batch_size here, let the Tuner tune them.

tuner.search_space_summary()

tuner.search()
model_best_model_epoch_batch_size = tuner.get_best_models(num_models=1)
# model_tuned = model_best_model_epoch_batch_size[0]
print(tuner.get_best_hyperparameters()[0].get_config()["values"])
# filepath_best_model ="video_best_batch_model"
# model_best_model_epoch_batch_size.save(filepath_best_model)
21kc-caracol commented 3 years ago

Try something like this : @saranyaprakash2012

Could you make it more clear about what code goes into what block? The indentation is a bit confusing

Thanks!

haifeng-jin commented 2 years ago

This guide is out of date. Please follow this guide instead.

vaxherra commented 2 years ago

I had some problems with the below version. Namely, I couldn't make it to run with custom objective.

class MyTuner(kerastuner.tuners.BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    super(MyTuner, self).run_trial(trial, *args, **kwargs)

I added the return statement and it fixed that

class MyTuner(kerastuner.tuners.BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    return super(MyTuner, self).run_trial(trial, *args, **kwargs)
davidwanner-8451 commented 2 years ago

@ogreyesp Thanks for the issue!

This comment is updated by @haifeng-jin because it was out-of-date. Following is the latest recommended way of doing it:

This is a barebone code for tuning batch size. The *args and **kwargs are the ones you passed from tuner.search().

class MyHyperModel(kt.HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        model.add(layers.Flatten())
        model.add(
            layers.Dense(
                units=hp.Int("units", min_value=32, max_value=512, step=32),
                activation="relu",
            )
        )
        model.add(layers.Dense(10, activation="softmax"))
        model.compile(
            optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            batch_size=hp.Choice("batch_size", [16, 32]),
            **kwargs,
        )

tuner = kt.RandomSearch(
    MyHyperModel(),
    objective="val_accuracy",
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="tune_hypermodel",
)

For epochs specifically, I'd alternatively recommend looking at using early stopping during training via passing in the tf.keras.callbacks.EarlyStopping callback if it's applicable to your use case. This can be configured to stop your training as soon as the validation loss stops improving. You can pass Keras callbacks like this to search:

# Will stop training if the "val_loss" hasn't improved in 3 epochs.
tuner.search(x, y, epochs=30, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

For n-fold cross validation, you can also just do it in HyperModel.fit() and return the result as a dictionary like {"val_accuracy": 0.3}, where the key is the name of the objective. Please follow this guide for more details.

Curious - is this considered the proper approach for tuning batch_size? It looks like this comment was edited in Feb 2022 so my assumption is yes, but I have not seen this approach in the docs (I could be missing them)

haifeng-jin commented 2 years ago

Yes, this is the official recommended approach. Thanks

muriloasouza commented 1 year ago

I am also trying to tune the batch_size and could use some help here please:

class MyHyperModel(keras_tuner.HyperModel):
    def build(self, hp):
        model = Sequential(name='Conv1D_Model') 
        model.add(InputLayer((timesteps, input_dim), name='input_layer'))
        for j in range(hp.Int("num_conv_layers", 1, 2)):
            model.add(Conv1D(filters=hp.Int(f'filters_{j}', min_value=32, max_value=256, step=32),
                             kernel_size=hp.Int('kernel_size', min_value=2, max_value=6, step=2),
                             activation='tanh',
                             name=f'{j}_conv_layer'))
            model.add(MaxPooling1D(pool_size=1))
        model.add(Flatten())
        if hp.Boolean("dropout"):
            model.add(Dropout(rate=0.25))

        for k in range(hp.Int("num_layers", 1, 3)):
            model.add(Dense(units=hp.Int(f'units_{k}', min_value=24, max_value=72, step=24),
                            activation='tanh',
                            name=f'{k}_dense'))
        model.add(Dense(units=1,
                        activation='tanh',
                        name='output_layer'))
        model.compile(optimizer='adam',
                      loss='mean_squared_error')
        return model

    def fit(self, hp, model, *args, batch_size=32,
            **kwargs):
        return model.fit(
            *args,
            batch_size=hp.Choice("batch_size", [16, 32, 64]),
            **kwargs,
        )

But in the search space i got this:

Search space summary
Default search space size: 6
num_conv_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 2, 'step': 1, 'sampling': None}
filters_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': None}
kernel_size (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 6, 'step': 2, 'sampling': None}
dropout (Boolean)
{'default': False, 'conditions': []}
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 3, 'step': 1, 'sampling': None}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 24, 'max_value': 72, 'step': 24, 'sampling': None}
None

Here is the first trial:

Search: Running Trial #1
Value             |Best Value So Far |Hyperparameter
1                 |?                 |num_conv_layers
160               |?                 |filters_0
4                 |?                 |kernel_size
False             |?                 |dropout
1                 |?                 |num_layers
72                |?                 |units_0

Shouldn't the batch size appears both in the search space and the trial report? How do i know wich batch size is being used?