keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
62 stars 28 forks source link

"WARNING:Callback method 'on_train_batch_end' is slow compared to the batch time" when no callback activated and training slowed down #268

Closed DanielYang59 closed 1 year ago

DanielYang59 commented 1 year ago

System information.

Describe the problem.

Get WARNING:tensorflow:Callback method "on_train_batch_end" is slow compared to the batch time (batch time: 0.1608s vs "on_train_batch_end" time: 0.2945s). Check your callbacks. warning when no callbacks was set.

Training is significantly slowed down and training time varies randomly between trials.

Describe the current behavior. Training is significantly slowed down and training time varies significantly between trials.

Describe the expected behavior. Training slow should be stable and no significant variance in training time is expected between epochs.

Contributing.

Standalone code to reproduce the issue.

# Generate dataset
dataset = tf.data.Dataset.from_tensor_slices((feature, label))
dataset = dataset.shuffle(buffer_size=total_sample, reshuffle_each_iteration=False)
train_set = dataset.take(train_size)
val_set = dataset.skip(train_size)

train_set = train_set.batch(batch_size=batch_size)
train_set = train_set.prefetch(tf.data.AUTOTUNE)
val_set = val_set.batch(batch_size)
val_set = val_set.prefetch(tf.data.AUTOTUNE)

# Hyper Tuning with Keras Tuner
from hp_model import hp_model
tuner = keras_tuner.Hyperband(
    hypermodel=hp_model,
    max_epochs=150,
    factor=3,
    overwrite=False,
    objective="val_mean_absolute_error",
    directory="hp_search",
    )

tuner.search(train_set, validation_data=val_set,
             epochs=1000,
             verbose=2,
             )

In the "hp_model", a hypermodel with eight hyperparameters is defined (should I search so many parameters at the same time?) like this, the complete source code is enclosed as "hp_model.py":

# Master Layer
hp_master_1st_dense_units = hp.Choice("hp_master_1st_dense_units", [64, 128, 256, 512, 1024])
hp_master_2nd_dense_units = hp.Choice("hp_master_2nd_dense_units", [64, 128, 256, 512, 1024])
hp_master_3rd_dense_layer = hp.Boolean("hp_master_3rd_dense_layer", default=False)
hp_master_activation_function = hp.Choice("hp_master_act_func", ["tanh", "relu", "sigmoid"])

# Branch
hp_branch_dense_activation_func = hp.Choice("hp_branch_dense_activation_func", ["tanh", "relu", "sigmoid"]) 
hp_numFilters = hp.Int("hp_numFilters", min_value=2, max_value=128, sampling="log")
hp_branch_kernel_size = hp.Int("hp_branch_kernel_size", min_value=2, max_value=32, step=2) 
hp_branch_dense_units = hp.Choice("hp_branch_dense_units", [16, 32, 64, 128, 256, 512])

Source code / logs.

This is the log file for training process tunerlog.txt.

Here is the source code for the hyper model and tuning process src.zip

sushreebarsa commented 1 year ago

@HaoyuYang59 I could run the source code successfully on colab using TF v2.9 and TF v2.11, please find the attached gists. Could you let us know if I am missing something to reproduce the reported issue. Thank you!

DanielYang59 commented 1 year ago

Hi @sushreebarsa , thanks for following up.

I realized yesterday that this might not be an issue with Keras Tuner, instead it seems to be expected as I was adjusting the number of ConV layers during tuning, as a result variance in training time should be normal, if I understand correctly?

Thanks for your time and wishing you all the best.

Regards, Haoyu

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No