Closed akritisaxena closed 7 years ago
The input should be just two images showing the same scene from different viewpoints. This can be two frames from a video. We do not impose restrictions on the position or orientations of the second camera. It is actually the task of the network to also compute the relative pose of the two images.
During training we present a large number of images with known relative poses to the camera and train it to predict the pose of the second camera.
Thanks @benjaminum ! This is really helpful.
The image pairs used in training don't look like the stereo pairs. Are they consecutive frames ? How are these 'unconstrained image pairs' considered for training? Also for prediction, image pairs are required in the same manner?