xinntao / EDVR

Winning Solution in NTIRE19 Challenges on Video Restoration and Enhancement (CVPR19 Workshops) - Video Restoration with Enhanced Deformable Convolutional Networks. EDVR has been merged into BasicSR and this repo is a mirror of BasicSR.
https://github.com/xinntao/BasicSR
1.48k stars 320 forks source link

Multiply offset values by 2 #153

Closed MiotyR closed 4 years ago

MiotyR commented 4 years ago

Hi,

I read the paper and your implementation of PCD. Thanks, it's pretty much clear and I've understood some of the new concepts you bring. I'm trying currently to go deep into my understanding of PCD, in particular this piece of code: # L2 L2_offset = torch.cat([nbr_fea_l[1], ref_fea_l[1]], dim=1) L2_offset = self.lrelu(self.L2_offset_conv1(L2_offset)) L3_offset = F.interpolate(L3_offset, scale_factor=2, mode='bilinear', align_corners=False) L2_offset = self.lrelu(self.L2_offset_conv2(torch.cat([L2_offset, L3_offset * 2], dim=1))) L2_offset = self.lrelu(self.L2_offset_conv3(L2_offset))

I understand that the l-th level offsets are calculated from upsampled l+1-th level offsets but there is one thing that I still don't get in the implementation. Why do you multiply the values of L3_offset by 2 after the upsampling ? In EDVR_arch.py, line 108 L2_offset= self.lrelu(self.L2_offset_conv2(torch.cat([L2_offset, L3_offset * 2], dim=1))) Is there a link with the interpolation or the stride ?

Thanks in advance for your reply

ckkelvinchan commented 4 years ago

When you upsample an image by two times, the motion should be multiplied by 2 correspondingly (movement of one pixel now becomes two pixels). Here we mimic this behavior as L3_offset is responsible for the alignment. But actually the convolution should be able to learn even without multiplying by 2 (We did not verify it though).

MiotyR commented 4 years ago

Thank you for this quick answer I had this intuition but I was not sure. Indeed, I was wondering if the convolution could have not done this task. It is more clearer for me now. Thanks again