nianticlabs / stereo-from-mono

[ECCV 2020] Learning stereo from single images using monocular depth estimation networks
Other
392 stars 54 forks source link

About backward warping v.s. forward warping #8

Closed SenZHANG-GitHub closed 3 years ago

SenZHANG-GitHub commented 3 years ago

Thanks for your great work!

I'm a bit confused that since some depth networks like monodepth can directly predict both left and right disparities from only the left image, why don't we just use backward warping from the left image and the right disparity to reconstruct the right image so as to avoid the problems w.r.t. forward warping? Yet more recent models starting from monodepth2 remove the left-right consistency loss and right disparity prediction, it seems ok for me to add these components back to get the right disparities?

mdfirman commented 3 years ago

Thanks for your interest!

I could imagine this is worth trying. I don't think we've ever tried the 'predict left and right disparities from a single input image' concept in the monodepth2 framework.

However – do bear in mind that predicting the disparity of the 'right' image only works under stereo training. It doesn't transfer well to training from mono sequences. Given the main focus of monodepth2 was to train under mono sequences (with good stereo performance more of an afterthought) we didn't explore these sorts of directions.

Feel free to try it out!

Thanks

mdfirman commented 3 years ago

Sorry – I got this mixed up, I thought this was a question for monodepth2 (where most github issues come from!)

@JamieWatson683 will give you a real answer shortly...

JamieWatson683 commented 3 years ago

Hi - really sorry for the delay in answering this!

It is an interesting idea to try this, but I can think of a couple of issues.

Whilst backwarping avoids holes, it does still suffer from occlusions - for example there will be repeated textures where a pixel can be seen in the right camera but not by the left.

Additionally, to train a (monocular) network to predict the right disparity from a left image you would need to have stereo pairs, with a fixed baseline. Our current method allows for using monocular networks trained using a wide variety of data - such as sfm reconstructions, pairwise labels, monocular video etc.

Let me know if you give it a shot though, I'd be interested to see any results!

Thanks a lot.