keras-team / keras-nlp

Modular Natural Language Processing workflows with Keras
Apache License 2.0
758 stars 227 forks source link

Dropout is not called in the training regime in TransformerEncoder and others #1500

Closed foxik closed 6 months ago

foxik commented 6 months ago

Hi all,

Describe the bug The TransformerEncoder layer does not take the training argument: https://github.com/keras-team/keras-nlp/blob/c5a37bc19e00a4d8e705c6a742225d3b454eb154/keras_nlp/layers/modeling/transformer_encoder.py#L185 and so it does not pass it to the dropout layers: https://github.com/keras-team/keras-nlp/blob/c5a37bc19e00a4d8e705c6a742225d3b454eb154/keras_nlp/layers/modeling/transformer_encoder.py#L217 https://github.com/keras-team/keras-nlp/blob/c5a37bc19e00a4d8e705c6a742225d3b454eb154/keras_nlp/layers/modeling/transformer_encoder.py#L228

If I understand it correctly, this means that the dropouts are never used.

However, there are many places which does not pass training -- TransformerDecoder and FNetEncoder in layers, and quite a few models in models -- XLNetEncoder, BloomDecoder, GemmaDecoderBlock, etc.

Note that the models which use functional API to create the models should be fine -- there the training argument is passed automatically; however, if subclassing API is used (i.e., def call is used), it is not passed.

Note that this is one of important differences between TF-Keras and Keras 3, because:

foxik commented 6 months ago

@fchollet I have taken the liberty of adding you here to verify this :bow: (if I am correct, this will need a non-trivial effort to fix and finetuning on Keras 3 will give suboptimal results until then).

foxik commented 6 months ago

Oh, sorry, I just realized how that now works in Keras 3 :man_facepalming: https://github.com/keras-team/keras/blob/ce06c6509db91f334168c66db2e7003101dcd749/keras/layers/layer.py#L743-L748

Closing.