Attention is implemented incorrectly

sayondutta / text_summarizer

Applied attention on Sequence to Sequence model for the task of Text Summarizer using Tensorflow's raw_rnn. here, I haven't used Tensorflow's inbuilt seq2seq function. The reason behind is to apply attention mechanism manually.

7 stars 6 forks source link

Attention is implemented incorrectly #2

Closed 0b01 closed 7 years ago

0b01 commented 7 years ago

This line: https://github.com/sayondutta/text_summarizer/blob/master/Seq2Seq_model_for_TextSummarizer-600L.py#L239

prev_out_with_weights = tf.matmul(previous_output, w['score'])

should be a multiplication between previous_state and w['score']

Could you please change that? Thanks in advance.

0b01 commented 7 years ago

Please ignore. It's correct.

Src: https://github.com/tensorflow/tensorflow/blob/7c10b24de3cb2408441dfd98e1a1a1e8f43f3a7d/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py#L348

0b01 commented 7 years ago

Actually it should be the output of the current cell (not previous_output)

https://github.com/tensorflow/tensorflow/blob/7c10b24de3cb2408441dfd98e1a1a1e8f43f3a7d/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py#L709

This raw_rnn function doesn't support it I see.

0b01 commented 7 years ago

Using previous_output is possible: https://arxiv.org/pdf/1508.04025.pdf

Page 4

On the other hand, at any time t, Bahdanau et al. (2015) build from the previous hidden state ht−1 → at → ct → ht , which, in turn, goes through a deep-output and a maxout layer before making predictions.7 Lastly, Bahdanau et al. (2015) only experimented with one alignment function, the concat product; whereas we show later that the other alternatives are better.

sayondutta commented 7 years ago

Ricky, can you share your Skype id. In case you can help me over a call.

On 22 Jun 2017 10:51 am, "Ricky Han" notifications@github.com wrote:

Using previous_output is possible: https://arxiv.org/pdf/1508.04025.pdf

Page 4

On the other hand, at any time t, Bahdanau et al. (2015) build from the previous hidden state ht−1 → at → ct → ht , which, in turn, goes through a deep-output and a maxout layer before making predictions.7 Lastly, Bahdanau et al. (2015) only experimented with one alignment function, the concat product; whereas we show later that the other alternatives are better.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sayondutta/text_summarizer/issues/2#issuecomment-310278463, or mute the thread https://github.com/notifications/unsubscribe-auth/ABIhOqbOf4ZynB3xqlGzHrN5qkF4T0szks5sGfnpgaJpZM4OByd7 .

0b01 commented 7 years ago

I have emailed you.

sayondutta commented 7 years ago

Thanks. Any other suggestion for improvement because the loss is not converging that much. Will batch_size and or change in word_embeddings approach help ?