Closed laomao0 closed 4 years ago
Yes, the warped frames and the warped first-level features are concatenated. Yes, the input to the green lateral block on the top left is 70 channels (3 + 3 + 32 + 32). Yes, the output of the block on the top left is 32 (the number in each block denotes the number of output channels).
Yes, the optical flow needs to be downsampled. Note that it is common for optical flow estimators based on deep learning to yield a low-res prediction (in the case for PWC-Net, the estimate is 1/4th of the input resolution that is then upsampled to get the full-resolution prediction). So you may not have to downsample the flow for the coarse levels but upsample the flow for the fine levels.
thanks for your reply!
Yes, the warped frames and the warped first-level features are concatenated. Yes, the input to the green lateral block on the top left is 70 channels (3 + 3 + 32 + 32). Yes, the output of the block on the top left is 32 (the number in each block denotes the number of output channels).
Yes, the optical flow needs to be downsampled. Note that it is common for optical flow estimators based on deep learning to yield a low-res prediction (in the case for PWC-Net, the estimate is 1/4th of the input resolution that is then upsampled to get the full-resolution prediction). So you may not have to downsample the flow for the coarse levels but upsample the flow for the fine levels.
thanks for your reply! Using PWC-net, when I upsample the flow of 1/4 resolution to 1/1 resolution (for example 10x10 -> 40x40 pixels), the multiplication factor is 20 in the PWC-net code [line 305 https://github.com/sniklaus/pytorch-pwc/blob/master/run.py].
If I need to up-sample the 1/4 flow to 1/2 resolution (for example flow of 10x10 -> 20 x 20), the factor is 20/2=10. Is it right?
For example, modify [line 305 https://github.com/sniklaus/pytorch-pwc/blob/master/run.py].
meta_flow = self.forward_pre(tensorPreprocessedFirst, tensorPreprocessedSecond)
tensorFlow_L1 = 20.0 * torch.nn.functional.interpolate(input=meta_flow, size=(intHeight, intWidth), mode='bilinear', align_corners=False)
tensorFlow_L2 = 20.0 / 2.0 * torch.nn.functional.interpolate(input=meta_flow, size=(int(intHeight/2), int(intWidth/2)), mode='bilinear', align_corners=False)
tensorFlow_L3 = 20.0 / 4.0 * torch.nn.functional.interpolate(input=meta_flow, size=(int(intHeight/4), int(intWidth/4)), mode='bilinear', align_corners=False)
Thanks for your patience.
You are correct indeed. I would not necessarily do the interpolation the way you outlined though. The predicted meta_flow
may already be at the resolution of tensorFlow_L3
, so interpolation may not be necessary. As for tensorFlow_L2
, I would compute it from 2 * upsample(tensorFlow_L3)
and likewise for tensorFlow_L1
, I would compute it from 2 * upsample(tensorFlow_L2)
.
You are correct indeed. I would not necessarily do the interpolation the way you outlined though. The predicted
meta_flow
may already be at the resolution oftensorFlow_L3
, so interpolation may not be necessary. As fortensorFlow_L2
, I would compute it from2 * upsample(tensorFlow_L3)
and likewise fortensorFlow_L1
, I would compute it from2 * upsample(tensorFlow_L2)
.
Thanks for your reply, that helps me a lot.
Hi, Snikalus, I have 2 questions about the details of your network.
thanks for your reply~