Closed MiotyR closed 4 years ago
When you upsample an image by two times, the motion should be multiplied by 2 correspondingly (movement of one pixel now becomes two pixels). Here we mimic this behavior as L3_offset
is responsible for the alignment. But actually the convolution should be able to learn even without multiplying by 2 (We did not verify it though).
Thank you for this quick answer I had this intuition but I was not sure. Indeed, I was wondering if the convolution could have not done this task. It is more clearer for me now. Thanks again
Hi,
I read the paper and your implementation of PCD. Thanks, it's pretty much clear and I've understood some of the new concepts you bring. I'm trying currently to go deep into my understanding of PCD, in particular this piece of code:
# L2
L2_offset = torch.cat([nbr_fea_l[1], ref_fea_l[1]], dim=1)
L2_offset = self.lrelu(self.L2_offset_conv1(L2_offset))
L3_offset = F.interpolate(L3_offset, scale_factor=2, mode='bilinear', align_corners=False)
L2_offset = self.lrelu(self.L2_offset_conv2(torch.cat([L2_offset, L3_offset * 2], dim=1)))
L2_offset = self.lrelu(self.L2_offset_conv3(L2_offset))
I understand that the l-th level offsets are calculated from upsampled l+1-th level offsets but there is one thing that I still don't get in the implementation. Why do you multiply the values of L3_offset by 2 after the upsampling ? In EDVR_arch.py, line 108
L2_offset= self.lrelu(self.L2_offset_conv2(torch.cat([L2_offset, L3_offset * 2], dim=1)))
Is there a link with the interpolation or the stride ?Thanks in advance for your reply