About source batch lengths

harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention

http://nlp.seas.harvard.edu/code

MIT License

1.26k stars 278 forks source link

About source batch lengths #74

Closed helson73 closed 7 years ago

helson73 commented 7 years ago

Target side is sorted, so every target sentence in one batch has same length. But in source side, sentence lengths would be vary, and it seems like no schemes to block "blank" source words. Even if "blank" embedding is set zero, output of blank positions would generate some value because of recurrence in LSTMs. Why this issue be ignored? P.S. when bi-lstm is used, backward lstm's context is just added to forward one, is there any special reason for this?

yoonkim commented 7 years ago

I think you have it the other way around: every batch has the same source length, but potentially different target lengths. We set the weight of the blank symbol in criterion to be zero, so we do not receive any gradients on the target side if target_output[t] = blank symbol (which has index 1).

For bi-lstm, we add because it's simple. Alternatively you could concatenate, but this will require some fiddling around with the rnn sizes.

helson73 commented 7 years ago

@yoonkim Thanks! I found it.