snipsco / ntm-lasagne

Neural Turing Machines library in Theano with Lasagne
https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315#.63t84s5r5
MIT License
300 stars 51 forks source link

Backpropagation outputs NaN in FAST_COMPILE #9

Closed tristandeleu closed 9 years ago

tristandeleu commented 9 years ago

During the computation of the gradient with backpropagation, it sometimes outputs NaN values when compiling in FAST_COMPILE mode. When we compute the gradient of the cost wrt W_wr_add (or b_wr_add) with backpropagation, it outputs NaNs at the last step of the gradient computation of the cost wrt the hidden state for the first step. It seems to come from the the initialization of h_0 as a zero vector (no issue with a uniform Glorot initialization). It also seems to be specific for the rectify activation used for the add vector (no issue with other activations like identity or tanh). Finally it works as intended in FAST_RUN mode with zero initialization and rectify activation.

tristandeleu commented 9 years ago

Somehow this seems to be due to the normalization with theano.tensor.norm in the cosine_similarity, and more precisely to the use of T.abs_ in the computation of T.norm. I switched to a manual computation of the norm (with T.sqrt(T.sum(x * x))) and it worked fined, even in FAST_COMPILE mode. This may be specific to the use of the normalization with T.norm and scan as it did not happened when unrolling the network through time instead of using scan.