harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

Character CNN model on decoder side #86

Closed boknilev closed 7 years ago

boknilev commented 7 years ago

There are a couple of places where the charCNN layers are mentioned with only the case of encoder side, not decoder side: https://github.com/harvardnlp/seq2seq-attn/blob/master/train.lua#L1079 https://github.com/harvardnlp/seq2seq-attn/blob/master/train.lua#L935

It looks like there should be similar code for the case of charCNN on the decoder side, because otherwise the decoder charCNN parameters will not be updated during training. Is that right?

yoonkim commented 7 years ago

Hi there. Those layers are only accessed to share parameters if the charcnn when using a bidirectional LSTM, so they are only needed on the encoder side

boknilev commented 7 years ago

Thanks for the answer. I was asking because I saw relatively low performance using charCNN on decoder side compared to using it on encoder side, but perhaps there's another reason for this.