Closed LX-Richard closed 4 years ago
Hi. Thanks for your question. The reason is that the transpose of the convolution operation itself involves also flipping the filter along all spatial dimensions. This is most easily seen for a simple scalar 1-D convolution by writing the corresponding convolution matrix explicitly. The flipping is not performed by the F.conv_transpose2d
function, and therefore needs to be performed explicitly. Note that we have several different implementation of the filter transpose (also ones that do not use F.conv_transpose2d
). All have been tested against torch autograd to generate the correct output up to numerical precision.
If you want, you can instead use our general steepest descent optimizer, which is implemented with double back-propagation and therefore automatically applies the transpose.
In your supplementary material of Dimp, the transposed Jacobian corresponds to the back propagation of the convolutional layer over the input which is implemented in the following way: filter_grad = F.conv_transpose2d(input.flip((2, 3)).view(1, -1, input.shape[-2], input.shape[-1]), feat.view(-1, feat.shape[-3], feat.shape[-2], feat.shape[-1]), padding=trans_pad, groups=num_images * num_sequences) Where I'm confused is why the input needs to be flip here. I was hoping you could help with my confusion.