visinf / irr

Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation (CVPR 2019)
Apache License 2.0
194 stars 32 forks source link

Question about feature warping. #39

Closed MeowZheng closed 3 years ago

MeowZheng commented 3 years ago

Hi Junhwa,

Many thanks for you sharing this nice work. I have a little question for feature warping in PWC-Net in this repo and IRR-PWC.

Before warping feature, people always rescale the shape and value of the flow together, but you only upsample the flow shape, like https://github.com/visinf/irr/blob/4d7f6aa46d6989d7dcf8aa1213fbc64f0058e038/models/pwcnet.py#L68-L69 and https://github.com/visinf/irr/blob/4d7f6aa46d6989d7dcf8aa1213fbc64f0058e038/models/IRR_PWC.py#L82-L88

Could you please explain a little about these?

Best regards, Meow

hurjunhwa commented 3 years ago

Hi,

That's a good question, and It's just a design choice. At each pyramid level, the scale of the optical flow is at the original image resolution, not that of the downscaled image. (please note that the loss function uses the downscaled optical flow map without scaling the optical flow value.) So it doesn't need to rescale the flow value when upsampling the resolution.

If you want to design a decoder that outputs the downscaled flow map (both values and shape), you can revise the upsampling functions (that you cited above) as well as the loss function (i.e., using the GT flow map in which value and shape are both properly downscaled).

I think I've tried both settings and observed that it made only a marginal difference in the supervised learning setting.

Hopefully, this answers your question!

Best, Jun

MeowZheng commented 3 years ago

Many thanks for your kind reply. Sorry I still felt confused about warping.

Could you answer the simplest question of whether the flow estimated at level N and will be used in level N-1 for warping is the spatial resolution at level N? If yes, I think the flow in level N must be rescaled the shape and value together before warping, as the coordinate is rescaled. If No, like PWC-Net estimates flow on the original scale at each level, it also rescales the value of flow for each level (ref), but your implementation didn't rescale the value. (we ignore the 0.05 factor in this discussion)

Thanks again for your patient. It means a lot to me.

Best regards, Meow

hurjunhwa commented 3 years ago

Hi Meow, Yes, that's correct indeed. It doesn't rescale for output, but it does rescale before warping.

https://github.com/visinf/irr/blob/4d7f6aa46d6989d7dcf8aa1213fbc64f0058e038/models/flownet_modules.py#L93-L107

It normalizes the flow and making it between [-1, 1], to use the warping function torch.functional.grid_sample. Basically it needs to do these two steps:

output flow -> rescaling to local scale -> divided by the height and width of the downscaled image. 

but implementationwise it's the same as:

output flow -> divided by the height and width of the original image.

Best, Jun

MeowZheng commented 3 years ago

OK, it's so clear at all! I understand your design!

Thank you Meow