Open mligg23 opened 4 months ago
The depth is metric, which means that the output numbers are meters.
As any metric depth estimator, the scale may not be perfect (as it is reciprocally related to the camera focal length). Moreover, out-of-domain data (for instance images of landscapes) do not belong to the training set, thus the model will fail to capture the depth correctly, i.e. the model thinks it is a miniature scene. This is due to the fact that the training data is mostly in the range 0-10 for indoor and 5-100 for outdoor.
I tried to estimate the distance of the object from the camera coordinates by using the depth result predicted by the model and the pixel region of the object on the RGB image, but I could not find the unit corresponding to the depth prediction result and whether it needed to be scaled. I hope the author can help me solve this doubt. Thank you