Dropout is not called in the training regime in TransformerEncoder and others

foxik commented 6 months ago

Hi all,

Describe the bug The TransformerEncoder layer does not take the training argument: https://github.com/keras-team/keras-nlp/blob/c5a37bc19e00a4d8e705c6a742225d3b454eb154/keras_nlp/layers/modeling/transformer_encoder.py#L185 and so it does not pass it to the dropout layers: https://github.com/keras-team/keras-nlp/blob/c5a37bc19e00a4d8e705c6a742225d3b454eb154/keras_nlp/layers/modeling/transformer_encoder.py#L217 https://github.com/keras-team/keras-nlp/blob/c5a37bc19e00a4d8e705c6a742225d3b454eb154/keras_nlp/layers/modeling/transformer_encoder.py#L228

If I understand it correctly, this means that the dropouts are never used.

However, there are many places which does not pass training -- TransformerDecoder and FNetEncoder in layers, and quite a few models in models -- XLNetEncoder, BloomDecoder, GemmaDecoderBlock, etc.

Note that the models which use functional API to create the models should be fine -- there the training argument is passed automatically; however, if subclassing API is used (i.e., def call is used), it is not passed.

Note that this is one of important differences between TF-Keras and Keras 3, because:

in TF-Keras, if no training is passed, the value of backend.learning_phase is used https://github.com/keras-team/tf-keras/blob/95be21afe33fe7d1dc0713ebf3bd4d211d94a065/tf_keras/layers/regularization/dropout.py#L108-L113
however, in Keras 3, the default is to use training=False if not passed: https://github.com/keras-team/keras/blob/ce06c6509db91f334168c66db2e7003101dcd749/keras/layers/regularization/dropout.py#L57-L65

foxik commented 6 months ago

@fchollet I have taken the liberty of adding you here to verify this :bow: (if I am correct, this will need a non-trivial effort to fix and finetuning on Keras 3 will give suboptimal results until then).

foxik commented 6 months ago

Oh, sorry, I just realized how that now works in Keras 3 :man_facepalming: https://github.com/keras-team/keras/blob/ce06c6509db91f334168c66db2e7003101dcd749/keras/layers/layer.py#L743-L748

Closing.

keras-team / keras-nlp

Dropout is not called in the training regime in TransformerEncoder and others #1500