When I try to run the gradient check, for Ws, the gradient check prints "VAL SMALL WARNING". I have printed the numerical gradients and analytical gradients in this case, and find that the numerical gradients are exactly zero, and analytical gradients are in the order of e-12.
I am confused about that, since the numerical gradients are zero, that means some words are not in the batch, so, changing its value will not affect the cost (in grad_check, we add delta to the word vectors). However, the analytical gradients are not zero, that means these words actually appear in the batch, and these word vectors are updated.
Hi,
When I try to run the gradient check, for Ws, the gradient check prints "VAL SMALL WARNING". I have printed the numerical gradients and analytical gradients in this case, and find that the numerical gradients are exactly zero, and analytical gradients are in the order of e-12.
I am confused about that, since the numerical gradients are zero, that means some words are not in the batch, so, changing its value will not affect the cost (in grad_check, we add delta to the word vectors). However, the analytical gradients are not zero, that means these words actually appear in the batch, and these word vectors are updated.
Why will this happen?
Thanks.