Deep transition RNNs/Stacked RNNs

spro / practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained

MIT License

4.51k stars 1.1k forks source link

Hi, I'm looking at your tutorial for Translation with a Sequence to Sequence Network and Attention. n_layers is the depth of your RNN. Thank you for a well-written and easy-to-follow tutorial. I have a couple of questions.

You are applying the RNN cell in a loop (for i in range(self.n_layers)) where the hidden state is fed from one layer to the next. According to https://arxiv.org/pdf/1312.6026.pdf, this is known as Deep Transition (DT) RNN. There is also something called stacked RNN in this paper, which is what I previously have referred to as Deep RNN. Would it be a good idea to clarify the differences to avoid confusion?

Also, I find it a bit confusing with hardcoded batch-size of 1. Is there a good reason not to mention batching? It doesn't make the code much more difficult to read.

Olof

spro / practical-pytorch

Deep transition RNNs/Stacked RNNs #64