RNN output value calculation

rushter / MLAlgorithms

Minimal and clean examples of machine learning algorithms implementations

MIT License

10.75k stars 1.78k forks source link

RNN output value calculation #48

Closed keineahnung2345 closed 5 years ago

keineahnung2345 commented 5 years ago

According to the Andrew Ng's deep learning course (a for hidden state, y for output value): We get output values by multiplying the hidden state by a weight matrix Wya, adding bias by onto it, and then go through an activation function.

But from https://github.com/rushter/MLAlgorithms/blob/6e383f73e87ff1afb62ff4d711e4d8dd245ae923/mla/neuralnet/layers/recurrent/rnn.py#L55-L63, it seems the hidden state is directly returned.

@rushter Can you please give your reference of RNN or confirm it as a bug? If it's a bug, I'd like to create a PR to fix it. 😄

rushter commented 5 years ago

@keineahnung2345 Thank you for your report. You are right, it's a bug.

rushter commented 5 years ago

Actually, that is not a bug. I think we will get the same behavior by putting a dense layer after the RNN layer. I need to refresh my understanding of RNNs.

My implementation is flexible and also supports multi-sequence output. https://github.com/rushter/MLAlgorithms/blob/6e383f73e87ff1afb62ff4d711e4d8dd245ae923/mla/neuralnet/layers/basic.py#L149

rushter commented 5 years ago

The lstm implementation follows the same approach.

Here is an example of how to get the same RNN formula by adding a Dense layer: https://github.com/rushter/MLAlgorithms/blob/6e383f73e87ff1afb62ff4d711e4d8dd245ae923/examples/nnet_rnn_text_generation.py#L43

keineahnung2345 commented 5 years ago

I see, so RNN layer should always be used with Dense layer, right?

keineahnung2345 commented 5 years ago

After reviewing Andrew Ng's Deep RNN lecture, I found the RNN layer in the bottom just returns states(a), and only the last RNN layer returns y. So the implementation should be correct, thanks!