Closed rxqy closed 4 years ago
Hi, the focus of our work is to estimate depth using correspondence between frames, similar to structure from motion. In this setting, it is not possible to recover global scale. The other multiframe methods we compare to (BA-Net, DeMoN, and DeepTAM) also evaluate using scale matching. In our paper, all the numbers in the tables are from methods where the output is scaled same as our method. When we compare to single image depth networks, we also scale their predictions.
The scaling equation is correct as written, since we want to scale our prediction to match the ground truth, not the other way around.
To get the scale factor you need to solve the equation: min_s ||s * t_pr - t_gt ||^2. If you take the derivative and set it to 0 you recover np.dot(t1, t2) / np.dot(t2,t2)
Many thx!
Hi, thanks for sharing the good work. However, I'm curious about the scale here in evaluation.
From my understanding, deepv2d is supervised, and should require no scaling in depth or pose evaluation. However, in your evaluation script, all depth and pose are rescaled, why do we need that?
Another problem is about the scaling factor when calculating the trans(cm) https://github.com/princeton-vl/DeepV2D/blob/a3fbef1379383f0429ffb3f2556155d4d20f0c9c/evaluation/eval_utils.py#L57
Shouldn't it be
np.dot(t1,t1)/np.dot(t1*t2)
?