visinf / multi-mono-sf

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)
Apache License 2.0
99 stars 17 forks source link

why your training datasets need stereo images (left and right views of scenes) ? #2

Closed TuringKi closed 3 years ago

TuringKi commented 3 years ago

From the figure 1. (in your paper): image

I does not get where you use the stereo images. and your evaluation code does not use them (the davis dataset...)

And can I do self-supervised training without stereo images? Since many wild datasets (open world dataset) do not have them.

hurjunhwa commented 3 years ago

Hi,

That's a really good question.

Yes, we use the stereo images only for training (specifically when calculating the proxy loss for disparity map), following the approach from monodepth.

The main advantages are that

However, of course as a drawback, it requires stereo pairs for training which is not always available as in many cases.

Maybe you can refer this work that uses monocular videos for training and estimates 3D motion.