Focal length of the VLC cameras

FabianB98 commented 2 years ago

Hi,

as part of my Master's thesis I'm trying to reconstruct a point cloud of the environment using the two front facing VLC cameras.[^note] So far I was already able to calculate a disparity map using OpenCV. However, I'm unable to calculate a corresponding depth image from the disparity map due to no information about the camera intrinsics (in my specific case the focal length) of the VLC cameras being available.

I am aware that the "HoloLens 2 Research Mode as a Tool for Computer Vision Research" paper explicitely states that "camera sensors do not directly expose instrinsics parameters" and that there are methods for converting between 3D coordinates and 2D image coordinates and the other way around. However, these methods are not of any help in this case. According to this tutorial, the disparity is equal to B * f / Z where B is the baseline distance between the two cameras, f is the focal length and Z is the corresponding depth. This equation only needs to be solved for Z in order to calculate the depth for a given pixel in the disparity map. While B can be calculated from the extrinsics of the two cameras, f is part of the camera intrinsics.

Considering that there is information available about the focal length of the RGB camera, why is there absolutely no information available at all about the focal length of the VLC cameras? Apparently, similar questions asking about camera intrinsics have come up over time, namely #34, #37 and #100. Some of these are now almost two years old and some of them could be solved by using the aforementioned conversion methods. However, these conversion methods are not of any help as I can't think of any way how these methods could be used for extracting the focal length. Has someone found some way of getting access to the intrinsics parameters?

[^note]: Yes, I am fully aware that the HoloLens 2 does offer a depth sensor which can be used for creating point clouds. In fact, I was already able to reconstruct point clouds which were then used for tracking moving objects. As part of my thesis, I should also use some of the other available sensors (e.g. the VLC cameras), which is why I wanted to try using the two front facing VLC cameras as a different source of generating point clouds.

mikeszabi commented 2 years ago

Hi, same issue here too! Would be nice to deduce the intrinsics from the VLC LUTs somehow. Is there a straightforward method for this?

FabianB98 commented 2 years ago

From my understanding, the MapImagePointToCameraUnitPlane and MapCameraSpaceToImagePoint methods should multiply the intrinsics matrix or its inverse with the given input vector or image point. Assuming that these two methods don't do anything else than just a matrix vector multiplication, we can write the call of this method as y = A * x where y is the result of the method call, A is the intrinsics matrix or its inverse and x is the input vector or image point. You could sample some pairs of x and y and set up a system of linear equations. Using linear algebra, this system can be solved for the elements of A. Repeat this process for each camera and you should have all intrinsics.

However, this process isn't really straightforward. As I also needed the extrinsic calibration between the two forward facing VLC cameras, I just went ahead and calibrated the cameras using a calibration pattern and OpenCV. At least to me, this seemed to be an easier approach than trying to extract the intrinsic matrix using the approach described above.

mikeszabi commented 2 years ago

I have alsos ended up estimating the intrinsics based on the method you mentioned above! I have also went through the standard chessboard calibrazion. The intrinsics I got has a really large variety based on the point pairs I had chosen or the images I had included in the calibration set. Below you'll find some values with different point pairs (u,v); x :(focus,principal). (u1,v1)=(0,0) (u2,v2)=(200,200) x_l: array([385.01280587, 316.44955616]) x_r: array([337.98439836, 333.10344029])

(u1,v1)=(200,200) (u2,v2)=(300,300) x_l: array([376.48255277, 313.8695272 ]) x_r: array([347.91297414, 337.01346572])

(u1,v1)=(0,0) (u2,v2)=(639,479) x_l: array([369.26611797, 303.50704536]) x_r: array([267.62356098, 263.75870986])

(u1,v1)=(400,400) (u2,v2)=(639,479) x_l: array([369.26611797, 303.50704536]) x_r: array([267.62356098, 263.75870986])

mrcfschr commented 1 year ago

This may be helpful too: https://github.com/microsoft/HoloLens2ForCV/issues/126

microsoft / HoloLens2ForCV

Focal length of the VLC cameras #138