keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.7k stars 19.43k forks source link

Attention layer does not accept output of previous layers in functional API #20318

Open jorgenorena opened 1 day ago

jorgenorena commented 1 day ago

As an exercise to get acquainted with Keras, I want to train a simple model with attention to translate sentences.

I am not calling a tf function, only using Keras layers. But I get the following error:

A KerasTensor cannot be used as input to a TensorFlow function. A KerasTensor is a symbolic placeholder for a shape and dtype, used when constructing Keras Functional models or Keras Functions. You can only use it as input to a Keras layer or a Keras operation (from the namespaces keras.layers and keras.operations). [...]

Here is the code for the model using Keras' functional API:

encoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)
decoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)

embed_size = 128
encoder_inputs_ids = text_vec_layer_en(encoder_inputs)
decoder_inputs_ids = text_vec_layer_es(decoder_inputs)
encoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
decoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
encoder_embeddings = encoder_embedding_layer(encoder_inputs_ids)
decoder_embeddings = decoder_embedding_layer(decoder_inputs_ids)

encoder = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True)
encoder_outputs, *encoder_state = encoder(encoder_embeddings)

decoder = tf.keras.layers.LSTM(512, return_sequences=True)
decoder_outputs = decoder(decoder_embeddings, initial_state=encoder_state)

# Attention layer here!
# Problems getting it to work on Keras 3
attention_layer = tf.keras.layers.Attention()
attention_outputs = attention_layer([decoder_outputs, encoder_outputs])

output_layer = tf.keras.layers.Dense(vocab_size, activation="softmax")
Y_probas = output_layer(attention_outputs)

Expected behavior: The Keras attention layer accepts Keras tensor inputs. Or a more helpful error message is given.

Python version: 3.11.0 Tensorflow version: 2.17.0 Keras version: 3.4.1 (bundled with that Tensorflow version)

mehtamansi29 commented 6 hours ago

Hi @jorgenorena -

Thanks for reporting the issue. Based on code understand that you are trying to create model with attention for translate sentence. Here instead of using tf.keras.layers.Attention you can use tf.keras.layers.MultiHeadAttention with query,key and value for dot product. And then those attention output need to combine with decoder output and then create model using function API. Attached gist for your reference here.

tf.keras.layers.Attention is not fetching the input like this attention_outputs = attention_layer([decoder_outputs, encoder_outputs]). Here you can find more details about attention layer.