sniklaus / 3d-ken-burns

an implementation of 3D Ken Burns Effect from a Single Image using PyTorch
Other
1.52k stars 225 forks source link

Question about the units of measurement #30

Closed dfrumkin closed 4 years ago

dfrumkin commented 4 years ago

Hello Simon!

From what I've seen (e.g. NYUv2, MegaDepth), depth is usually measured in meters. Your model seems to output depth in millimeters. The sample of the synthetic dataset that you published also seems to be in millimeters. Could you clarify what units you are using and why?

I think this is important (and maybe something that should be mentioned) when you combine the two losses because the ordinal loss depends on the units whereas the gradient loss does not, so 1e-4*ordinal loss + gradient loss in millimeters would become 1e-7*ordinal loss + gradient loss when using meters.

dfrumkin commented 4 years ago

Another important parameter when combining the losses is whether you are using the inverse depth 1/depth or disparity as implied in your evaluation code, which would then depend on the baseline (e.g. 40mm) and the field of view (e.g. 90 degrees). Would be great if you could shed some light on the choice of those parameters in training. Thanks a lot!

sniklaus commented 4 years ago

I am afraid that there is an inherent scale ambiguity when it comes to depth estimation. While NYUv2 may have a mapping from their estimated depth to real-world units due to using a depth sensor, MegaDepth is using structure-from-motion and hence does not have such a mapping.

Likewise with our dataset, each virtual environment may be scaled by an arbitrary factor. As such, our baseline does not correspond to real-world units but to pixels, 20 pixels to be exact. During training, we assume a 90 degree field of view and a baseline of 20, the conversion from depth to inverse depth for examples from our dataset thus becomes: (512 * 20) / depth