microsoft / Azure-Kinect-Sensor-SDK

A cross platform (Linux and Windows) user mode SDK to read data from your Azure Kinect device.
https://Azure.com/Kinect
MIT License
1.49k stars 618 forks source link

Are intrinsic parameters of the IR camera involved in depth computation? #937

Closed jasjuang closed 4 years ago

jasjuang commented 4 years ago

This is a follow up to #803. It makes sense to re-estimate the intrinsic parameters for a multi-device system because of bundle adjustment. That is exactly what I did, the alignment result is better, but it is still not completely aligned.

I am not familiar with the ToF method to calculate depth. My question is when using the ToF method to calculate depth, are the intrinsic parameters of the IR camera involved during the computation of the depth? If yes, how is it involved?

I am guessing once I re-estimate the intrinsic parameters, the depth values in the depth image has to be adjusted somehow to reflect the change. Otherwise, if I use the depth image directly, the depth values might be based on the old intrinsics, hence the alignment still fails. I know for sure the depth will have to be re-estimated on a traditional depth from stereo system if the intrinsic is re-estimated, but I am not that sure about ToF.

amonnphillip commented 4 years ago

Or to put it another way, do we need a method for users to write their own intrinsic values to the device. Given the factory intrinsic values do not seem to be accurate enough..

jasjuang commented 4 years ago

@wes-b Can we please get an answer on this one? I really want to get #803 to work.

rabbitdaxi commented 4 years ago

@jasjuang To answer the question in the title, depth engine uses intrinsics to compute z depth. ToF principle computes radial depth internally and we convert to z depth for SDK to expose as depth image. We also use intrinsics for laser/sensor offset compensation as well.

Regarding your comment about needing to re-estimate intrinsics parameter, @rajeev-msft may comment on this, but it is surprising to me that you need to recalibrate intrisics instead of fix it and only calibrate extrinsics between cameras.

Regarding the question of whether can write your own intrisics values to the device, I do not think it is supported now, but @wes-b might be comment about this.

However, if you really want to locally try use your own calibrated value and pass into depth engine with current status of the SDK, you can look into https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/90f77529f5ad19efd1e8f155a240c329e3bcbbdf/src/dewrapper/dewrapper.c#L149, and you can see the depth engine took the k4a_calibration_camera_t pointer, you can give it a try to hack this line with your own values. For transformation API, you only need to initialize the transformation object with updated k4a_calibration_t object with your own values.

Hope these help.

wes-b commented 4 years ago

Regarding the question of whether can write your own intrisics values to the device, I do not think it is supported now, but @wes-b might be comment about this.

Sorry, we don't support saving custom calibration to the device.

jasjuang commented 4 years ago

@rabbitdaxi thanks for the detailed response, I will look into the code you pointed to and report back any findings.

jasjuang commented 4 years ago

@rabbitdaxi After digging through the code, I was able to make some progress but one thing that confuses me is when I print out the focal and principal points like the below right at the code you pointed to

printf("fx %f\n",dewrapper->calibration->intrinsics.parameters.param.fx);
printf("fy %f\n",dewrapper->calibration->intrinsics.parameters.param.fy);
printf("cx %f\n",dewrapper->calibration->intrinsics.parameters.param.cx);
printf("cy %f\n",dewrapper->calibration->intrinsics.parameters.param.cy);

the output is

fx 0.493066
fy 0.493119
cx 0.500009
cy 0.507625

It seems like the values are normalized. I can understand cx and cy normalized according to the image width and image height, but what is fx and fy normalized according to?

rabbitdaxi commented 4 years ago

@jasjuang in this specific location, the fx and fy normalized with image width and height respectively. Anyway, this particular code path is using an internal format that pass to depth engine, therefore, we expect the factory intrinsics is consumed here. If you are interested to know, you can refer to this function which may help you understand how we convert between the internal normalized intrinsics values and the opencv-like intrinsics.

By the way, regarding that your original goal is trying to calibrate multi-device extrinsics, may be you can also try use solvepnp from opencv (with intrinsics fixed) and see whether that can give you a good result? If the factory intrinsics is not accurate on your device, then you might see weird 3d result even with just single device, e.g. flat surface might not be flat.

jasjuang commented 4 years ago

@rabbitdaxi I guessed fx and fy were normalized with image width and height respectively, so thanks for the confirmation. I just finished modifying the SDK in a really hacky way to manually input our own intrinsics into the depth engine. I am going to re-capture again and report back my findings.

I wouldn't say the factory intrinsics are not accurate. It's just not optimal. When we run calibration, which is basically non-linear optimization with LM, there are way more than one combination of parameters that will yield results with low re-projection error, it's non-linear, hence there's no global minimum. Chances are very little an individually calibrated result will happen to satisfy the additional constraints imposed by multi-device calibration. There's a reason bundle adjustment exist. Bundle adjustment basically finds another combination of parameters that will also yield results with low re-projection error but at the same time satisfy additional constraints imposed by the multi-device calibration.

rabbitdaxi commented 4 years ago

@jasjuang yup, agree with you that you might find optimal parameters including camera calibration and pose and so on with bundle adjustment (potentially you even take into account other unit specific systematic depth error in your optimization and letting you achieve good results). Glad to know that you can hack around with the open sourced SDK.

amonnphillip commented 4 years ago

@jasjuang Have you managed to re-calibrate the IR camera? The lens is 120 degrees and so OpenCV libs seem to have a lot of problems finding a solution. I can calibrate the color camera lens easily enough, but the IR camera is a bit elusive right now. I use an IR floodlight to illuminate a high accuracy checkerboard and use over 100 images at various angles, even at the edges of the lens image. I put the camera in IR mode to get the 1024x1024 IR image and use this to calibrate. I have even tried fixing some properties and allowing the solver to optimize others, but no matter the combination I still do not get anywhere near a solution that when undistorted looks ok. Any insight on this would be appreciated.

Thanks.

jasjuang commented 4 years ago

@amonnphillip Are you having difficulties extracting the corners? I remember you were using charuco boards, right? The resolution of the IR images is low and there is still a lot of noise even when lighted by an IR floodlight. Did you try using a conventional checkboard?

I am able to recalibrate and bundle adjust everything but.... it still doesn't align. I am still trying to pinpoint the problem.

amonnphillip commented 4 years ago

@jasjuang Thanks for the response on this. I took another look at my code and realized I used world points where I should be using image points and vice versa (oh!).. Anyway now I can get a rectified image that looks good. I wrote some code based on a paper to test the depth accuracy via points projected obtained through checkerboard corners from the color image (using a high accuracy board + high accuracy corner detection algo). I will run the calibration results through that code tomorrow and let you know if it is any more accurate than the pre-calibrated values from the camera.

amonnphillip commented 4 years ago

@jasjuang So what I see is that calibration of the IR and color camera does indeed help. I still haven't managed to get the accuracy that I would get from something like the RealSense cameras for example. In my tests the RMS goes from cm to ~2mm after calibration. It should be noted that can get ~1mm or so RMS from the RealSense camera at a resolution of 1280x720.

After going through all that calibration and pose estimation stuff I removed my calibration values and did some quick tests with different resolutions (with the factory calibration). I discovered that simply changing the color resolution to something higher helped RMS by more than 50%. I ran 10 iterations of my test (a low number, so not very scientific, but more of a sanity check that indicates it was not a outlier reading). The results showed that the RMS drops for a color resolution of 2048x1536 and 4096x3072. Now you can attribute this to the fact that you are using a higher color resolution so corner accuracy and therefor pnp accuracy will be much better, but would it be that much better than 1280x720??

Then I tried upping the IR resolution to 1024x1024 (WFOV) and the RMS went right up again??

Like I said, these are just quick tests to try to make some sense of what I am seeing. I wonder if someone could also give this a try.. Test lower color resolutions against higher and see what the RMS is for each..

jasjuang commented 4 years ago

@amonnphillip How is your RMS error calculated? Judging from your unit is it some sort of distance in 3d space?

Here's what I observed so far. I recalibrate and bundle adjust everything based on the reprojection error of the checkerboard corners. I plotted the reprojected corners and on the images and they all look on point, which aligns with the reported numbers, so that means my calibration code is doing its job correctly. However, the interesting thing is if I plot the checkerboard corners in 3d point clouds based on the calibration results, and then compare with simply using the depth map provided by Azure Kinect to turn the checkerboard corners into point clouds, they are actually off by around 1cm when I view them in meshlab. I can think of two explanation for this

  1. The depth provided by Azure Kinect is not accurate.
  2. Even with bundle adjustment and constraints imposed by multi-device, it's still not enough to constrain the non-linear optimization to arrive with a correct answer, mostly due to the distortion parameters that introduce too many degrees of freedom.

I am hoping it's case 2 because I am about to try adding the depth map provided by Azure Kinect into the calibration. I am not sure if I should keep using reprojection error as the main minimization objective for the non-linear optimization and simply add another penalty term for the difference between pnp and depth provided by Azure Kinect, or do I need to make a huge change to my calibration code so that the main minimization objective becomes the difference in Euclidian distance in 3d for the corners instead of the good old reprojection error. Regardless of which way I add the depth information into calibration, a huge challenge is in order to make this happen, I have to somehow extract the depth engine out of the SDK and include it into the non-linear optimization for calibration.

Question for @wes-b and @rabbitdaxi, what are the inputs required for the depth engine? Besides the intrinsics, is it simply just the IR image? or is there something like a ToF buffer that we have to figure out how to save?

qm13 commented 4 years ago

The depth engine input is multiple ToF frames from the camera.