Closed ClementPinard closed 6 years ago
We recently added code to verify the gradient shape (https://github.com/pytorch/pytorch/pull/8168), so it's expected that this would break. I'll fix it
Fixed on master
Thanks for fixing it. However you forgot to change it on python/lltm_baseline.py
. The module is actually never called whether from benchmark.py
or [grad_]check.py
so it doesn't trigger an error, but if you call it, you will also have the problem.
The problem only occurs on pytorch master, because it's backprop engine is less compliant : when running
benchmark.py cpp
(or cuda) :This is due to modules bias parameter to be of size
3 * state_size
while the backward outputs a tensor of size1 x 3 * state_size
. The problem is still here for torch 0.4.0, but the backprop engine doesn't complaint as the number of elements is the same.So the solution could be to remove the
keepdim=True
in the d_bias computing e.g. here (but it's the same for python baseline, cpp and cuda)But then you get the opposite error message when running
check.py
andgrad_check.py
:This is because now the bias given to the function is of size
1 x 15
!The solution is pretty simple, but needs to decide on what to do :
bias
parameter in every nn module dimension1 x ...
check.py
andgrad_check.py
and remove thekeepdim=True
arguments when computingd_bias
sums.