spro / practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
MIT License
4.52k stars 1.11k forks source link

Question from character level RNN classifier, why not use the hidden state across epochs? #139

Open labJunky opened 4 years ago

labJunky commented 4 years ago

In the RNN classification example, using characters of names to predict the names language, the train function re-zeros the hidden state (and gradient) every epoch. I was wondering why this is done, instead of carrying over the final hidden states of the epoch before?

ZhouXing19 commented 4 years ago

One epoch means a run-through of a word. If we start a new epoch, which means we are training the network with a new word, we need to redefine the hidden state of the initial letter of the new word, since states of different words are independent.