hardmaru / supercell

supercell
190 stars 42 forks source link

Why adding bias to the forget gate? #7

Closed AzizCode92 closed 6 years ago

AzizCode92 commented 6 years ago

https://github.com/hardmaru/supercell/blob/063b01e75e6e8af5aeb0aac5cc583948f5887dd1/supercell.py#L216

The code implementation didn't correspond exactly to the equation we have in the layer normalization paper. I also have doubts about normalizing all the gates, so for example, the forget gate will never be equal to zero du to the shift we add. Isn't more logic to just keep the gates as they are and then just normalize cell state?

Thank you

AzizCode92 commented 6 years ago

found the reason why