spro / practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
MIT License
4.51k stars 1.1k forks source link

Deep transition RNNs/Stacked RNNs #64

Open olofmogren opened 6 years ago

olofmogren commented 6 years ago

Hi, I'm looking at your tutorial for Translation with a Sequence to Sequence Network and Attention. n_layers is the depth of your RNN. Thank you for a well-written and easy-to-follow tutorial. I have a couple of questions.

You are applying the RNN cell in a loop (for i in range(self.n_layers)) where the hidden state is fed from one layer to the next. According to https://arxiv.org/pdf/1312.6026.pdf, this is known as Deep Transition (DT) RNN. There is also something called stacked RNN in this paper, which is what I previously have referred to as Deep RNN. Would it be a good idea to clarify the differences to avoid confusion?

Also, I find it a bit confusing with hardcoded batch-size of 1. Is there a good reason not to mention batching? It doesn't make the code much more difficult to read.

Olof

IdiosyncraticDragon commented 6 years ago

@olofmogren I was confuse about the loops of gru, too, when I checked the official tutorial in http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html. The tutorial in practical-pytorch (use nn.GRU which accepts the nlayer parameter, no loops of gru anymore) now is different from the official tutorial, I think maybe it is not intend to use the "transition" in pytorch's official doc, it maybe just a bug.....^^. It's my opinion.