harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

GRU implementation #25

Open jayparks opened 8 years ago

jayparks commented 8 years ago

Hi team. thanks for great work.

I'm currently trying to construct seq2seq model with Bahdanau style, featured by bidirectional encoder - decoder using GRU cell.

This repository, however, doesn't seem to have GRU implementation, which now i'm trying to add. (If you already have, it would be a lot of help for time saving..!)

Is it just the models.lua/make_lstm function that i need to modify?

And one more question is about bidirectional encoder:

The description of bidirectional encoder options' saying: "hidden states of the corresponding forward/backward LSTMs are added to obtain the hidden representation for that time step."

does it mean it literally adds two hidden states, not the normal concatenation scheme?

and if so, would it be compatible to other codes if i simply change the hidden representation into the concatenation of two hidden states?

thanks for the reply in advance

yoonkim commented 8 years ago

For the GRU implementation, in addition to modifying make_lstm, it will require some tinkering in the training code, as GRU doesn't have a cell state in addition to the hidden state. (so rnn_state_enc is a table with 2*opt.num_layers with LSTM but it should just be opt.num_layers with GRUs). It would be great to have GRU--let us know if/when you've implemented it! Also, apparently GRU is a little harder to optimize with vanilla SGD so you might want to use adagrad/adam/adadelta (we provide adagrad as an option currently).

Yeah, we just simply add the hidden states, since doing the concatenation would require some fiddling with initializing the decoder with the last step of the encoder (since encoder would have twice the number of dimensions). I guess you could use do a linear layer to get back to the original dimensions, which would be slightly annoying.

Hope this helps!

jayparks commented 8 years ago

Hi! Thanks to your help, i'm currently working on GRU implementation. I'll let you know once it's done :)

ylhsieh commented 8 years ago

I also am very interested in the GRU implementation! BTW, could @yoonkim elaborate on the part where GRU is harder to optimize using SGD? I don't quite understand why. Or perhaps you can point me toward some references? Thanks.

yoonkim commented 8 years ago

i don't really understand why either, but empirically i've found this to be the case (vanilla SGD doesn't really work well with GRU).

emjotde commented 8 years ago

Hi, has there been any progress on this? Would be nice to have in order to do proper comparisons between models and tool kits.

bmilde commented 7 years ago

+1, would be interested in this for comparison purposes!