I believe there are two issues here if you were trying to model vanilla RNN which its formulation is as follows:
which are based on Elman(vanilla RNN) network :
First it does not introduce a nonlinear transformation while calculating the new hidden state,
and Second it dose not consider the $h_t$ for the calculation of the output!
So my question is, were you implementing the Elman network or was it a completely new variation of the RNN?
In case I'm wrong what am I missing here?
If you could kindly clarify this, I'd appreciate it .
Looking at the following diagram and the code you wrote , which is :![image](https://user-images.githubusercontent.com/5382892/61577782-7fdcbb00-ab01-11e9-9d3d-1c0d93eda6ee.png)
I believe there are two issues here if you were trying to model vanilla RNN which its formulation is as follows:![image](https://user-images.githubusercontent.com/5382892/61577885-dbf40f00-ab02-11e9-8e4e-aa36f220341f.png)
which are based on Elman(vanilla RNN) network :![image](https://user-images.githubusercontent.com/5382892/61577889-f4fcc000-ab02-11e9-94be-86e0eb20a3d7.png)
First it does not introduce a nonlinear transformation while calculating the new hidden state,
and Second it dose not consider the $h_t$ for the calculation of the output!
So my question is, were you implementing the Elman network or was it a completely new variation of the RNN? In case I'm wrong what am I missing here? If you could kindly clarify this, I'd appreciate it .