Hi,everyone
I got confused when I read the code.
In rnn/model_search.py line 28
ch = masked_states.view(-1, self.nhid).mm(self._Ws[i]).view(i+1, -1, 2*self.nhid)
It seems that hidden states of all predecessor share the same matrix: H{3} = WH{0}+WH{1}+W*H{2}
Actually, I think right computation is H{3} = W{0,3}*H{0}+W{1,3}H{1}+W_{2,3}H{2}.
Any knows the reason why author uses the same matrix? just only for saving memory?
Hi,everyone I got confused when I read the code. In rnn/model_search.py line 28
ch = masked_states.view(-1, self.nhid).mm(self._Ws[i]).view(i+1, -1, 2*self.nhid)
It seems that hidden states of all predecessor share the same matrix: H{3} = WH{0}+WH{1}+W*H{2} Actually, I think right computation is H{3} = W{0,3}*H{0}+W{1,3}H{1}+W_{2,3}H{2}.Any knows the reason why author uses the same matrix? just only for saving memory?