lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
588 stars 47 forks source link

camera module output and pinhole parameterization #10

Closed leiwang1023 closed 5 months ago

leiwang1023 commented 5 months ago

Hi. According to the description of the paper, the camera parameters predicted by the network and then use the pinhole imaging model to obtain the focal length and the main optical axis coordinates by the following formula: image

but in you code:

    intrinsics[:, 0, 0] = max(original_shapes) / 2 * intrinsics[:, 0, 0]  # why max(shape) not W/H

    intrinsics[:, 1, 1] = max(original_shapes) / 2 * intrinsics[:, 1, 1]

    intrinsics[:, 0, 2] = intrinsics[:, 0, 2] * original_shapes[1]  

    intrinsics[:, 1, 2] = intrinsics[:, 1, 2] * original_shapes[0]

I am confused what is the difference and connection between these two.

thanks a lot!

lpiccinelli-eth commented 5 months ago

Thank you for pointing this out. There is a typo in the paper about H and W in f_x and f_y, it should be max(H,W).

Anyway, let's take only the first line, corresponding to the focal length on x dimension for simplicity, the other lines are the same but for different camera parameters. The first line can be rewritten as f_x = max(H, W) / 2 * Delta f_x.

The lack of clarity may stem from overwriting intrinsics variable. The camera_layer output, i.e. the first instrinsics, contains the "delta" camera parameters, namely the so-called multiplicative residual Delta f_x,Delta f_x,Delta c_x,Delta c_y, in the appropriate locations (i.e., [0,0], [1,1], [0,2], [1,2])l. Each location is overwritten to obtain the parameters of the proper projection matrix from the multiplicative residual of the camera_layer, i.e. from Delta f_x to f_x and so on.

Let me know if this clarifies more or if there is still something to be clarified