Closed bnehoran closed 4 years ago
The gradient formula should be updated to contain the proper calls to .contiguous()
.
wouldn't reshape
also potentially work (and avoid making things contiguous in the case where the view is valid)?
Right, I forgot that now .view()
can handle some non-contiguous Tensors. Indeed, reshape()
is better.
The issue looks to be the backwards definition of sparse_mask.
Make sure that https://github.com/pytorch/pytorch/issues/28650 also passes.
I can't reproduce this nor https://github.com/pytorch/pytorch/issues/28650 so both look to be fixed on master. @bnehoran could you try installing a nightly build of pytorch to confirm if the problem still persists for you?
I'm optimistically closing this because it seems fixed on master. Please feel free to reopen if this is not the case.
Yeah, awesome. It seems to have been fixed sometime over the past couple of weeks.
🐛 Bug
Some PyTorch primitives expect the gradient passed in during the backward pass to be contiguous, but not all functions produce a contiguous gradient in their backward pass. When two incompatible functions -- one which returns a non-contiguous gradient in the backward pass, and another which expects a contiguous gradient as input to its backward pass -- are strung together, the autodifferentiation fails. In particular,
permute
andpad
don’t play well together, as the following example shows:which results in
Tested on the master branch (
1.5.0a0+7fdc6cb
) but was already present in versions as early as 1.2.Edit: The error in https://github.com/pytorch/pytorch/issues/28650 might be related to this issue.
cc @ezyang @SsnL @albanD @zou3519 @gqchen