nianticlabs / monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Other
4.13k stars 953 forks source link

Questions about inverse warping #442

Closed xiat0616 closed 2 years ago

xiat0616 commented 2 years ago

Hi, Nice work. I do have some confusions about inverse warping.

I see in the function, you first convert 2D images to its world coordinates. For each pixel, there would be (x,y,z) representing its location. My question is why you use the depth of target image to represent z for world coordinates of source image, instead of use the depth of source image to represent it? Thank you.

daniyar-niantic commented 2 years ago

The 2D coordinate grid of the target frame is lifted into 3D using depth of target frame. Then it is projected to source image. The projected locations allow us to fetch colors from source frame and compare then against colors of target frame. This way, there are no holes in "target frame reconstructed using pixels of source frame" and we do not need to model occlusions.

See: https://github.com/nianticlabs/monodepth2/issues/65

dumyCq commented 6 months ago

Hi, Nice work. I do have some confusions about inverse warping.

I see in the function, you first convert 2D images to its world coordinates. For each pixel, there would be (x,y,z) representing its location. My question is why you use the depth of target image to represent z for world coordinates of source image, instead of use the depth of source image to represent it? Thank you.

you can treat F.grid after all doing the backward wrapping