Closed felixhao28 closed 4 years ago
Hi @philipperemy and @felixhao28 . I am trying to apply attention model on top of an LSTM, where my input training data is a nd array. How should I fit my model in this case? I get the following error because of my data being a nd array
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
What changes should I make? Would appreciate your help! Thank you
@AnanyaO did you have a look at the examples here: https://github.com/philipperemy/keras-attention-mechanism/tree/master/examples?
Hi, thanks for all of uers' comments. I have learned a lot from that. But can I ask a question. If we use an RNN(or some variants of it), we can get the hidden states of each time_step which can then be used to compute the score. But if I did not use Lstm to be as an encoder, alternately, I use a 1D CNN as an encoder, what should I do when I want to apply attention. For example, I would like to handle some textual messages, so I first used an embedding layer and then used a 1DConv layer. Is there some methods I can use to apply the attention mechanism to my model. Thanks so much.
Update on 2019/2/14, nearly one year later:
The implementation in this repo is definitely bugged. Please refer to my implementation in a reply below for correction. My version has been working in our product since this thread and it outperforms both vanilla LSTM without attention and the incorrect version in this repo by a significant margin. I am not the only one raising the question 1.
Both this repo and my version of attention are intended for sequence-to-one networks (although it can be easily tweaked for seq2seq by replacing
h_t
with current state of the decoder step). If you are looking for a ready-to-use attention for sequence-to-sequence networks, check this out: https://github.com/farizrahman4u/seq2seq.============Original answer==============
I am currently working on a text generation task and learnt attention from TensorFlow tutorials. The implementation details seems quite different from your code.
This is how TensorFlow tutorial describes the process:
If I am understanding it correctly, all learnable parameters in the attention mechanism are stored in , which has a shape of
(rnn_size, rnn_size)
(rnn_size
is the size of hidden state). So first you need to use to calculate the score of each hidden state based on the value of the hidden state and , but I am not seeing anywhere in your code. Instead, you applied a dense layer on all . And that means (Edit: h_t should be h_s in this equation) becomes the in the paper. This seems wrong.In the next step you element-wise multiplies the attention weights with hidden states as equation (2). Then somehow missed the equation (3).
I noticed the tutorial is about Seq2Seq (Encoder-Decoder) model and your code is an RNN. Maybe that is why your code is different. Do you have any source on how attention is applied to a non Seq2Seq network?
Here is your code: