attention.py - Githubissues

nbathreya commented 5 years ago

Hi. I am very new to tensorflow, ML and chiron. I had few basic questions about the implementation of the attention in chiron model.

Is this used instead of rnn layers or after all bi-directional rnn layers or is it used with one rnn to encode and then attention to decode and minimize its loss?
In attention.py, the attention_decoder outputs logits and attention_list which is taken to calculate the loss. However, the attention_loss function outputs another attention_list. what is this list?
Is the loss from attention_loss used directly into the adam optimizer or a ctc loss is further calculated and then fed into adam optimizer?

It would be of the greatest help if I could get some feedback from you. I look forward to hearing back from you as soon as possible.

Nagendra

haotianteng commented 5 years ago

We compare Attention mechanism with CTC, however, attention didn't give as good result as CTC, so we turned to use CTC, but the code is still preserved.

The implementation is based on the following papers: https://arxiv.org/abs/1506.07503 https://arxiv.org/abs/1409.0473

nbathreya commented 5 years ago

So, when you implemented attention decoder, you just inserted in the attention_loss as prediction error?

Another question, in attention_loss function, you describe "label_len:[batch_size] label length, the symbol is included." Do we have to include "end" symbol at the end of each label batch as we pass to this command?

Is it possible to provide a training code with attention mechanism used? I would like to make sure I use the chiron model with attention right. (I want to see how much of a difference there is with attention and non-attention mechanism and see mathematically what is causing that). This would help me GREATLY!

haotianteng / Chiron

attention.py #75