microsoft / HoloLensForCV

Sample code and documentation for using the Microsoft HoloLens for Computer Vision research
MIT License
472 stars 156 forks source link

read_sensor_poses function recorder_console.py sample #126

Open serhan-gul opened 4 years ago

serhan-gul commented 4 years ago

Hi, I have some doubts about the camera_to_image matrix in the read_sensor_poses function of the sample app: https://github.com/microsoft/HoloLensForCV/blob/master/Samples/py/recorder_console.py

Is camera_to_image representing here the projective transformation to the image domain? Why is there two ways of setting camera_to_image, one of them as an identity matrix and the other with two -1's in the diagonal? Specifically, I mean the following part:

            if identity_camera_to_image:
                camera_to_image = np.eye(4)
            else:
                camera_to_image = np.array(
                    [[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
LisaVelten commented 4 years ago

Hi Serhann,

I do not understand the part your are wondering about either. I had a look at the Camera Intrinsics a bit closer and I am confused about the following aspects:

The CameraProjectionTransform has the following parameters: CamProjT = [2.43247 0 0 0 0 4.31968 0 0 0.0701278 -0.0997288 -1 -1 0 0 0 0]

As far as I understand: these are the Camera Intrinsics for a mapping onto a Unit Plane ranging from -1 to 1 in X- and Y-direction and with a Z-coordinate (imaginary focal length) of -1.

If I query the CameraIntrinsics Property of the VideoMediaFrame I get the Intrinsic Parameters you can find attached HololensIntrinsics_1280x720

It seems like the CameraProjectionTransfrom is the Camera To Image projection for a mapping onto a unit plane and the "UndistortedProjectionTransform", which is highlighted blue in the attached image, is the equivalent to the Camera Projection Transform for mapping onto a Image Plane ranging from 1280x720. Here I get confused about the Z-coordinate, which should correspond to the focal length. The parameter fy is negativ (-1555.67334) - what is the reason for that?

According to https://docs.microsoft.com/de-de/windows/mixed-reality/locatable-camera, section "Distortion Error" the frames should be already undistorted. As the frames saved by the recorder tool are not "preview" frames - right? Microsoft says: Because only the CameraIntrinsics are made available, applications must assume image frames represent a perfect pinhole camera. This does not make sense to me as you can see in the attached picture that also the RadialDistortion is made available. Or do they mean something else by this?

Well, if we use the images recorded by the Recorder Tool, we need to use the UndistortedProjectionTransform and thus, we do not need to consider the radial distortion. Is that correct?

I wanted to compare the values of the CameraProjectionTransform and UndistortedProjectionTransform to try to make sense of the values. I did it as follows:

For the Unit Plane Mapping the x-coordinate of the Principal Point lies at:

For the Mapping onto the image plane size 1280x720 the x-coordinate of the Principal Point lies at:

Formula 1: Delta_x_UnitPlane/2.43247 = 0.02882987252 Formula 2: Delta_x_Regular/1557.021 = 0.02883538501

The same can be done for the y-parameters: Formula 3: -0.0997288/4.31968 = -0.2308708052 Formula 4: 35.926422/-1555.67334 = -0.02309385851

So, the proportions of focal length to offset are about equivalent (Solution of Formula 1 and 2 and Solution of Formula 3 and 4), which is a evidence for the fact that the mapping correspond to the same camera, just that one transform is for the mapping onto a unit plane and the other for the mapping onto a plane of size 1280x720.

What is unclear to me are the signs of the UndistortedProjectionTransform. It does not make sense to me that fy is negative.

Also I do not understand the relation between the "normal Principal Point", which lies at (595.1027, 324.073578) and the Value in the UndistortedProjectionTransform, where the Principal Point is at (595.1027, 395.926422).

The values for the Focal Length stay the same except for the minus sign.

I assume that all this has something to do with the fact that the parameters are for the unprojected images, while the other parameters are for distorted images. However, I cannot make sense of it.

Can someone help with this?

Thanks a lot, Lisa