jzilly / RecurrentHighwayNetworks

Recurrent Highway Networks - Implementations for Tensorflow, Torch7, Theano and Brainstorm
MIT License
404 stars 70 forks source link

Fix resetting of initial hidden states #6

Closed shimisalant closed 7 years ago

shimisalant commented 7 years ago

Hi Rupesh / Julian, Please take a look at this small fix (for Theano code). This is for correctly resetting RHN initial hidden states (relevant when RHNs are stacked). Thanks.

jzilly commented 7 years ago

Hi @shimisalant Thanks for fixing this bug. Have you tried whether the code now runs as it is supposed? There are not too many changes in any case. What I am wondering is whether the line: self._sticky_hidden_states.append(y_0) will not unnecessarily accumulate memory over time into the list.

shimisalant commented 7 years ago

Hi Julian,

Yes, I have tested this fix - I checked that it runs the way it is supposed to (and not whether an architecture with stacked RHN layers improves results on some task -- my main goal implementation goal was to reproduce the single-RHN-layer results on PTB).

The line you reference runs once for each stacked RHN layer (there are config.num_layers of them) -- we first define and compile the entire computational graph a single time, and then repeatedly forward/backward-pass through it. The referenced line is part of graph definition (so memory consumed by its execution does not change during training/evaluation) -- hopefully this answer your question (?)

jzilly commented 7 years ago

Hi @shimisalant,

Thanks for the explanation. I merged your pull request with the master branch. Everything should be up-to-date now. Thank you again for fixing this bug! Best, Julian

shimisalant commented 7 years ago

Great, thanks Julian!