Open PhilFM opened 4 months ago
Thank you for raising this issue. I think that the misunderstanding comes from the following two points:
model.infer()
method.I ran demo.py
with your snippet instead of intrinsics_torch = torch.from_numpy(np.load("assets/demo/intrinsics.npy"))
with 2 different FoVs,
The top picture is obtained by setting the variable fov_horiz_degrees
to 30 deg, while the bottom one corresponds to setting the variable to 80 deg.
It is possible to notice how the model "thinks" that the top image is farther away (low FoV related to zooming), while it "thinks" that the bottom one presents a closer scene since 80deg of FoV corresponds to a wide-angle image, i.e. de-zooming.
I use "reversed magma" as colormap, so yellow means closer, while purple (to black) is farther away.
Hi Luigi, thanks very much for your reponse. As you explain it is a misunderstanding on my part, not a bug. The relationship you suggest between FOV and depth is interesting. In effect you are saying that low and high levels of FOV for the same image imply that there has been a "dolly zoom" effect applied, where the camera is moved towards a scene (the dolly) to compensate for the increased FOV (the zoom). This combination of camera dolly and zoom will minimise the effect on the image, as is well known in movies (Vertigo, Jaws etc). It would be quite a challenge to create a depth algorithm which could distinguish correctly between images taken at two levels of dolly zoom, predicting the correct FOV and depth in the two cases. Next challenge for you?
That would be an interesting project and we tried to disentangle as much as possible in UniDepth. However, I am afraid that by leveraging only monocular images it is not feasible to solve this inherent ambiguity. One solution/improvement would involve extending our framework to multi-views/videos.
It seems that there is a bug that setting the intrinsics doesn't seem to affect the run. The relevant code fragment I use is
No error is reported but whatever value I use for the fov_horiz_degrees parameter doesn't seem to affect the result. I've tried the V1 and V2 models. I've also tried the (1,3,3) shape for the intrinsics array, which is the shape returned by the prediction. No difference. Am I doing something wrong or is it a bug?
I check the output intrinsics from the prediction (predictions["intrinsics"] for V1 and predictions["K"] for V2). I expected that this would return the intrinsics passed in. However the intrinsics returned by the prediction appear to stay the same as passing no intrinsics. Also the depth maps look the same independent of the FOV.
Thanks for any help.
Phil