Closed dfrumkin closed 4 years ago
Another important parameter when combining the losses is whether you are using the inverse depth 1/depth or disparity as implied in your evaluation code, which would then depend on the baseline (e.g. 40mm) and the field of view (e.g. 90 degrees). Would be great if you could shed some light on the choice of those parameters in training. Thanks a lot!
I am afraid that there is an inherent scale ambiguity when it comes to depth estimation. While NYUv2 may have a mapping from their estimated depth to real-world units due to using a depth sensor, MegaDepth is using structure-from-motion and hence does not have such a mapping.
Likewise with our dataset, each virtual environment may be scaled by an arbitrary factor. As such, our baseline does not correspond to real-world units but to pixels, 20 pixels to be exact. During training, we assume a 90 degree field of view and a baseline of 20, the conversion from depth to inverse depth for examples from our dataset thus becomes: (512 * 20) / depth
Hello Simon!
From what I've seen (e.g. NYUv2, MegaDepth), depth is usually measured in meters. Your model seems to output depth in millimeters. The sample of the synthetic dataset that you published also seems to be in millimeters. Could you clarify what units you are using and why?
I think this is important (and maybe something that should be mentioned) when you combine the two losses because the ordinal loss depends on the units whereas the gradient loss does not, so
1e-4*ordinal loss + gradient loss
in millimeters would become1e-7*ordinal loss + gradient loss
when using meters.