Closed boknilev closed 7 years ago
Hi there. Those layers are only accessed to share parameters if the charcnn when using a bidirectional LSTM, so they are only needed on the encoder side
Thanks for the answer. I was asking because I saw relatively low performance using charCNN on decoder side compared to using it on encoder side, but perhaps there's another reason for this.
There are a couple of places where the charCNN layers are mentioned with only the case of encoder side, not decoder side: https://github.com/harvardnlp/seq2seq-attn/blob/master/train.lua#L1079 https://github.com/harvardnlp/seq2seq-attn/blob/master/train.lua#L935
It looks like there should be similar code for the case of charCNN on the decoder side, because otherwise the decoder charCNN parameters will not be updated during training. Is that right?