why your training datasets need stereo images (left and right views of scenes) ?

Hi,

That's a really good question.

Yes, we use the stereo images only for training (specifically when calculating the proxy loss for disparity map), following the approach from monodepth.

The main advantages are that

it outputs depth in an absolute scale (instead of an arbitrary scale from other approaches using monocular images for training) -> no need to properly scale in the test time.
it doesn't need to estimate ego-motion + moving object handling -> thus it makes the proxy loss much simpler & training more stable.

However, of course as a drawback, it requires stereo pairs for training which is not always available as in many cases.

Maybe you can refer this work that uses monocular videos for training and estimates 3D motion.

visinf / multi-mono-sf

why your training datasets need stereo images (left and right views of scenes) ? #2