Open ylqfp opened 8 years ago
The LSTM has the same idea of 'forgetting' and then adding the new input. That was what was meant by that line in the paper.
Just to be clear: I was in no way involved with the writing of the paper. This repo just happens to be one of the popular implementations of the NTM currently.
The paper said: "Taking inspiration from the input and forget gates in LSTM, we decompose each write into two parts: an erase followed by an add". Why? Thanks!