vrenkens / nabu

Code for end-to-end ASR with neural networks, build with TensorFlow
MIT License
108 stars 43 forks source link

question about speller #21

Closed riyijiye closed 6 years ago

riyijiye commented 6 years ago
screen shot 2018-03-15 at 6 09 38 pm

Hi Vincent,

First, this is a great framework. I have a question about the speller code (speller.py). I do suppose you implemented speller of LAS architecture here. As I know the current time step decoder state (s_t) computation in the speller take attention output c_t, previous time step's decoder state s_t-1, and also the previous time step's speller output y_t-1(follow symbol in original LAS paper) But actually I did not see y_t-1 contribute to s_t computation. Am I missing something here?

by the way, attached the snapshot of part of LAS paper related to this question.

look forward to your feedback, many thanks! harry

vrenkens commented 6 years ago

Hi Harry,

To see how the previous time step is used is not very straightforward. You need to look into rnn_decoder.py. The inputs are put into the helper (line 59), which is used in the decoder (line 67) which is then used in the dynamic_decode function (line 74).

The helper uses scheduled sampling during training. It will use the correct target as input to the rnn cell in most cases, but with some probability it will use the actual output of the rnn cell to make it more robust against its own errors during decoding.

The decoder contains the functionality that has to be done at each time step and finally the dynamic_decode function will actually create the while loop that uses the decoder at every timestep.

riyijiye commented 6 years ago

really appreciate the super quick reply. I will take a close look at the parts you mentioned. Thank you so much!