lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
588 stars 47 forks source link

Bug in setting known intrinsic parameters #48

Open PhilFM opened 4 months ago

PhilFM commented 4 months ago

It seems that there is a bug that setting the intrinsics doesn't seem to affect the run. The relevant code fragment I use is

    intrinsics = np.zeros((3,3), dtype=np.float32)
    Fdx = (1.)/math.tan(0.5*args.fov_horiz_degrees*math.pi/180.0) # focal distance with range [-1,1] in x
    Fdpix = 0.5*Fdx*img.width
    np.put(intrinsics, 0, Fdpix)
    np.put(intrinsics, 4, Fdpix)
    np.put(intrinsics, 2, 0.5*img.width)
    np.put(intrinsics, 5, 0.5*img.height)
    np.put(intrinsics, 8, 1.)
    predictions = model.infer(rgb, torch.from_numpy(intrinsics))

No error is reported but whatever value I use for the fov_horiz_degrees parameter doesn't seem to affect the result. I've tried the V1 and V2 models. I've also tried the (1,3,3) shape for the intrinsics array, which is the shape returned by the prediction. No difference. Am I doing something wrong or is it a bug?

I check the output intrinsics from the prediction (predictions["intrinsics"] for V1 and predictions["K"] for V2). I expected that this would return the intrinsics passed in. However the intrinsics returned by the prediction appear to stay the same as passing no intrinsics. Also the depth maps look the same independent of the FOV.

Thanks for any help.

Phil

lpiccinelli-eth commented 4 months ago

Thank you for raising this issue. I think that the misunderstanding comes from the following two points:

  1. The returned intrinsics are the ones predicted, which are independent of the intrinsics you pass to the model. However, the predicted ones are not used internally if you pass some intrinsics to the model.infer() method.
  2. The difference in the output depth map related to different focal lengths (or FoVs) should be on a global scale only. Therefore, the depth up-to-a-global-scale should be (almost) the same, but the global scale is different, in particular, higher FoV is usually related to closer depth while low FoV to deeper depth (since it mimics zooming).

I ran demo.py with your snippet instead of intrinsics_torch = torch.from_numpy(np.load("assets/demo/intrinsics.npy")) with 2 different FoVs, The top picture is obtained by setting the variable fov_horiz_degrees to 30 deg, while the bottom one corresponds to setting the variable to 80 deg.

It is possible to notice how the model "thinks" that the top image is farther away (low FoV related to zooming), while it "thinks" that the bottom one presents a closer scene since 80deg of FoV corresponds to a wide-angle image, i.e. de-zooming.

I use "reversed magma" as colormap, so yellow means closer, while purple (to black) is farther away.

output_fov30 output_fov80

PhilFM commented 4 months ago

Hi Luigi, thanks very much for your reponse. As you explain it is a misunderstanding on my part, not a bug. The relationship you suggest between FOV and depth is interesting. In effect you are saying that low and high levels of FOV for the same image imply that there has been a "dolly zoom" effect applied, where the camera is moved towards a scene (the dolly) to compensate for the increased FOV (the zoom). This combination of camera dolly and zoom will minimise the effect on the image, as is well known in movies (Vertigo, Jaws etc). It would be quite a challenge to create a depth algorithm which could distinguish correctly between images taken at two levels of dolly zoom, predicting the correct FOV and depth in the two cases. Next challenge for you?

lpiccinelli-eth commented 4 months ago

That would be an interesting project and we tried to disentangle as much as possible in UniDepth. However, I am afraid that by leveraging only monocular images it is not feasible to solve this inherent ambiguity. One solution/improvement would involve extending our framework to multi-views/videos.