Backpropagation outputs NaN in FAST_COMPILE

snipsco / ntm-lasagne

Neural Turing Machines library in Theano with Lasagne

MIT License

300 stars 51 forks source link

During the computation of the gradient with backpropagation, it sometimes outputs NaN values when compiling in FAST_COMPILE mode. When we compute the gradient of the cost wrt W_wr_add (or b_wr_add) with backpropagation, it outputs NaNs at the last step of the gradient computation of the cost wrt the hidden state for the first step. It seems to come from the the initialization of h_0 as a zero vector (no issue with a uniform Glorot initialization). It also seems to be specific for the rectify activation used for the add vector (no issue with other activations like identity or tanh). Finally it works as intended in FAST_RUN mode with zero initialization and rectify activation.

snipsco / ntm-lasagne

Backpropagation outputs NaN in FAST_COMPILE #9