microsoft / Azure_Kinect_ROS_Driver

A ROS sensor driver for the Azure Kinect Developer Kit.
MIT License
304 stars 226 forks source link

When to use undistorted images and what camera intrinsic parameters to use for 3D projections? #212

Open KamilDre opened 3 years ago

KamilDre commented 3 years ago

Apologies that I have logged this as a feature request - I am very new to git and did not know how to ask a question directly. However, I thing that some things could be clarified in the usage.md file to makes things more clear for others.

My question is:

In the usege.md file, it is specified that images on the following topics are not undistorter :

My question is, are images on the following topics undistorter since there is no comment about them:

If so, it would be nice to clarify this in the usage.md file. And if they are not undistorter, should I use the corresponding camera info topic to undistort them using image_proc?

Also, does the intrinsic matrix of the camera change after rectification?

christian-rauch commented 3 years ago

All image_raw topics contain the raw unrectified/undistorted images. You have to use the camera info to rectify these images. This can be done with the image_proc nodelets. The kinect_rgbd.launch does this conveniently for you by loading all nodelets (driver and rectification) into a single nodelet manager.

The intrinsic matrix is used for the rectification. It does not change unless you override it in the driver with a manual calibration.

KamilDre commented 3 years ago

Thank you for your reply. I understand that the intrinsic matrix does not change on the topic itself. My question is weather it should be the same for a raw and rectified image? If it makes sense that the same relationship between 3D points and pixels holds for a distorted and undistorted image? Below I have pasted in a raw depth image (top) and the corresponding rectified image (bottome), both obtained with the kinect_rgbd.launch file. As you can see, the two images look very different to each other. The rectified image looks as if it was "zoomed in" a bit. This makes me wonder if the same intrinsic matrix is valid for both images? image_raw image_rect_raw

christian-rauch commented 3 years ago

You cannot use the calibrated intrinsics to project from the rectified image into 3D.

You either can use the calibrated intrinsic on the raw unrectified images, or you have to use the ideal intrinsics on the rectified images. The ideal intrinsics are computed from the technical specification of the sensor resolution and field-of-view.

KamilDre commented 3 years ago

This clarifies a lot for me, thank you. So when calibrating the extrinsics of an RGB camera using rectified images, I would assume that I would also have to use the ideal intrinsics? And is the reason for this that undistorting an image maps it to an equivelent image that would have been taken by an ideal pinhole camera, and hence the need for the use of the ideal intrinsics?

The intrinsic matrix consists of 5 parameters: a focal length for the x and y direction, , , the skew coefficient between the x and y axis , and the principal point . For a resolution and for an angular field of view , would I calculate the intrinsic parameters as

, , , and ?

To calculate the focal length, I have inverted the equation for calculating the angular field of view from a focal length described in https://www.edmundoptics.co.uk/knowledge-center/application-notes/imaging/understanding-focal-length-and-field-of-view/. As for the principal point, I have assumed that it is at the centre of the image (is this correct?) and that indexing starts from 0 (hence the -1). If this is not correct, I would be grateful if you could correct me or point me to a source where I could learn more about this.

On a side note, I have noticed that the kinect_rgbd.launch file does not create a topic for rectified rgb_to_depth images and hence does not publish them. Is this expected?

christian-rauch commented 3 years ago

For an extrinsic calibration between raw images, I would first do the intrinsic calibration for RGB and IR and then use the rectified images. But you can also do this in one go like for the stereo calibration http://wiki.ros.org/camera_calibration/Tutorials/StereoCalibration.

Your equations look correct. But why do you want to do this manually? You have to compute the ideal instrics for the RGB and IR camera and you have to change this for each of the camera modes.

The kinect_rgbd.launch does not use the rgb_to_depth images at all. It only uses the raw images and the depth_to_rgb registered images.

KamilDre commented 3 years ago

Essentially, I am building up a 3D reconstruction of an object. To do so, I need to integrate RGBD images taken from multiple viewpoints around the object into a Truncated Signed Distance Function (TSDF) volume. I am using rectified RGB images from the /rgb/image_rect_color topic and rectified aligned depth images from the /depth_to_rgb/image_rect_raw topic. Now, the TSDF algorithm also takes as input the camera intrinsics and extrinsics for each of the views. So the extrinsics I was referring to is not the mapping between the RGB and the IR sensor but between the end-effector of a robot and the RGB sensor.

Now, the images have been rectified, so I wondered if I could still use the intrinsics from the /rgb/camera_info topic. However, as you have told me, I should not be using this, and instead, I should use the ideal intrinsics.

As for the extriniscs, I have calibrated a wrist-mounted RGB camera by taking multiple images of an April Tag from various viewpoints. However, I have found that the calibration is not very good. The most likely reason for this is uncertainty in April tag detection. The algorithm that I am using is from the apriltag_ros package. This package subscribes to the rectified RGB image topic and to the corresponding camera info topic. Hence, I was also wondering if I should use the ideal intrinsics of the RGB sensor here as well.

I am thinking of calculating the intrinsics manually as I predominantly work with 1 RGB resolution and one depth mode.

christian-rauch commented 3 years ago

Now, the images have been rectified, so I wondered if I could still use the intrinsics from the /rgb/camera_info topic. However, as you have told me, I should not be using this, and instead, I should use the ideal intrinsics.

Either this, or better use the raw images and factory calibration directly. Since your algorithm will use the intrinsic anyway, there is no point in rectifying the images first.

As for the extriniscs, I have calibrated a wrist-mounted RGB camera by taking multiple images of an April Tag from various viewpoints.

For AprilTags, you have to use the rectified images. See https://github.com/microsoft/Azure_Kinect_ROS_Driver/pull/200.

I am thinking of calculating the intrinsics manually as I predominantly work with 1 RGB resolution and one depth mode.

If you are using the colour and depth as part of a 3D pipeline that already uses the intrinsic for projection, I recommend to just use the raw images and the factory calibration.

KamilDre commented 3 years ago

Thanks for all your help and all these tips. I will try these things tomorrow.

Also, just to clarify, factory and manual calibration give the intrinsic parameters and distortion coefficient for a camera. The intrinsic parameters capture the relationship between pixels in the raw images that were used as part of the calibration and 3D points. On the other hand, the distortion coefficients can be used to undistort/rectify the raw images, effectively changing them to images that would have been captured by an ideal pinhole camera with the same angular field of view, image resolution and extrinsics. Hence, it is better to use raw images and factory or manual calibration in 3D pipelines. This way, we avoid loss of data and artefacts being introduced by the undistortion process. Is my understanding of all of this correct? If so, what is the reason for rectifying images in the first place, apart from making them look more natural? Is there any source that you could recommend where I could learn a bit more about all of this?

Also, from your personal experience, would you consider the factory calibration of the Kinect to be accurate?

christian-rauch commented 3 years ago

You calibrate and rectify images to get correct the projection and the ideal view. Manufacturing inaccuracies and optics change how the image is projected compared to the ideal projection. For a 3D projection, you can either use the raw images and the "corrected" projection matrix, or the "corrected" images and the ideal projection matrix. This should give you the same results. If you want to read about it, I can recommend "Computer Vision: A Modern Approach: International Edition" by David A. Forsyth and Jean Ponce.

Rectifying images directly makes sense if you need the ideal representation in 2D. For example, if you want to match a template or detect parallel and straight lines. https://github.com/microsoft/Azure_Kinect_ROS_Driver/pull/200#issuecomment-883406304 gives you a good example of the outcome of rectification. The edges of the light on the ceiling in the raw image are not straight. The rectified images have much more straight lines. Hence the term "rectification".

The example will also show you that the factory calibration gives "straighter lines" than my manual calibration with a small A3 chessboard pattern. You might get better manual calibration with a larger calibration pattern.

KamilDre commented 3 years ago

Thanks for the source! From #200 it is very clear that rectification makes lines more straight. Since AprilTag pose estimation relies on template matching, then it explains why it requires rectified images. But in this case, I would guess that it also requires the ideal projection matrix to be very accurate.

For the 3D projection with raw images, when you wrote the "corrected" projection matrix, are you implying that the intrinsic matrix from the factory calibration must be "corrected" somehow to account for the fact that the Kinect is not an ideal pinhole camera? If so, how is this correction carried out? I am asking because I wonder if the intrinsic matrix that is the output of a standard calibration procedure can be used for 3D projection of an image that was not generated by an ideal pinhole camera, as 3D projection assumes this camera model.

christian-rauch commented 3 years ago

For the 3D projection with raw images, when you wrote the "corrected" projection matrix, are you implying that the intrinsic matrix from the factory calibration must be "corrected" somehow to account for the fact that the Kinect is not an ideal pinhole camera?

No, the calibrated intrinsic are already "corrected". This is easy visible by comparing the calibrated image centre with the image size.

I wonder if the intrinsic matrix that is the output of a standard calibration procedure can be used for 3D projection of an image that was not generated by an ideal pinhole camera, as 3D projection assumes this camera model.

Yes, that is what you do if you just use the factory calibration on the raw image.

KamilDre commented 3 years ago

Thanks a lot! This all makes sense now! I will now rename this question to reflect how our conversation has unfolded so that others can also learn from your answers.

KamilDre commented 3 years ago

Over the weekend, I have been thinking about this problem more and more. From my understanding, the intrinsics matrix does not account for image distortions caused by the camera lens, as a pinhole model assumes that it is not present. If this is true, would it not be more accurate to use the ideal intrinsic matrix and a rectified image than a raw image and factory calibration to do a 3D projection of a depth image? I think this can be true as rectification accounts for image distortion.

Edit: According to slide 4 of https://www.uio.no/studier/emner/matnat/its/nedlagte-emner/UNIK4690/v17/forelesninger/lecture_5_3_camera_calibration.pdf, the intrinsic matrix describes the correspondence between points in the world and points in a undistorted image. This is also stated in https://www.imatest.com/support/docs/pre-5-2/geometric-calibration-deprecated/projective-camera/ . This appears to contradict using the ideal intrinsic matrix with rectified images?

christian-rauch commented 3 years ago

Sorry, I mixed some things together here. The camera meta data stored in sensor_msgs/CameraInfo contains the intrinsics (K) for the projection and the distortion parameters (distortion_model + D). If you only need a rectified 2D image you have to use the distortion parameters and the intrinsic. E.g., if you use a lens that changes the field-of-view, you definitely have to update the intrinsic matrix K to account for the new 3D projection. However, you will not be able to apply a non-linear (e.g. polynomial) mapping to undistort an image, just by using the intrinsic matrix. I think P is only relevant in the stereo case, and at least on /depth/camera_info it is identical to K.

The RectifyNodelet and the PinholeCameraModel class gives you a good overview of how the CameraInfo message is used in the corresponding OpenCV API. The pure image distortion is done in PinholeCameraModel::rectifyImage with a 2D mapping between raw and undistorted image, which is created by cv::initUndistortRectifyMap. This is using K and D o create the 2D mapping.

francescomilano172 commented 2 years ago

Hi @christian-rauch, a follow-up on the question of which image the K matrix should be applied to. Backtracking from the kinect_rgbd.launch, in particular for the case when depth_registered_processing is True (hence the /depth_to_rgb topic provided by the drivers is used), I got convinced that rgbd_launch actually applies K to the rectified images. In particular:

Could you confirm that my interpretation is correct? On a side note, this seems not to be in line with the "Derivation diagram" shown in the image_pipeline package, which seems to suggest that K should be applied on the raw image, as you were mentioning.

christian-rauch commented 2 years ago

If you have a distorted image, e.g. when using a large FoV or due to other lense impurities, you have to rectify the image first using the distortion coefficients in D and the projection in K. In most cases, this will be a non-linear mapping from the distorted "real" to the undistorted "normal" image. After this, you can apply the projection using K. If your distortion is linear, you can just incorporate this into your projection matrix. If you have non-linear distortions, you cannot just apply the K from the ROS message (I updated my previous comment to avoid confusion).

For the details on the ROS side, start with the PinholeCameraModel. It has all the methods defined. See my comment above on how the corresponding OpenCV functions are used.

The Derivation diagram maybe be confusing since it is showing the image formation process that involves the 3D to 2D projection process (see also the "Projection onto the original image"). There your first do the ray casting from a 3D point to the camera and then the "ray" is distorted by the lense or other effects. If you want to do the 2D to 3D projection, you have to reverse that process of course.

francescomilano172 commented 2 years ago

Thank you for your prompt reply. I am familiar with the OpenCV pipeline for undistortion and I am already using image_geometry.PinholeCameraModel in my ROS interfaces. These two things give me what seem to be "correct" point clouds (up to color leak on the object boundaries, but this is likely another issue from what I've read). My question comes from the fact that by reading this thread (e.g., this answer) I understood that your point was that the intrinsic matrix K from the /rgb/camera_info topic should not be applied to the rectified images, which is instead what happens in rgbd_launch (and I also agree that should be the case). In short, I would like to make sure that the right procedure to obtain a point cloud from /rgb, /depth_to_rgb and /rgb/camera_info is the following:

  1. Rectify /rgb and /depth_to_rgb by using the D coefficients in /rgb/camera_info (e.g., via OpenCV).
  2. Use the (ideal) pinhole camera equation on the rectified RGB and depth images given by 1., to obtain the point cloud. In particular, use the K matrix from /rgb/camera_info as intrinsic matrix in pinhole camera equation.
christian-rauch commented 2 years ago

Previously, because the Azure Kinect is using a factory calibration, I incorrectly assumed that the published images are already calibrated. This is not the case. The raw driver publishes the undistorted images. Therefore you have to rectify the images using either the factory or a manual calibration. Only after this can you use the K from the CameraInfo message.

Your procedure is correct. You can do these steps manually or use the PinholeCameraModel class for rectification and projection. Though, the projection methods only work with individual pixels.

francescomilano172 commented 2 years ago

Perfect, thank you.

bmegli commented 2 years ago

@KamilDre wrote:

The rectified image looks as if it was "zoomed in" a bit. This makes me wonder if the same intrinsic matrix is valid for both images?

Reusing CameraInfo matrix K for P submatrix K' is valid but will result in "semi-random" viewpoint as consequence of distortion coefficients.

See http://wiki.ros.org/image_pipeline/CameraInfo to understand how undistortion is performed.

bmegli commented 2 years ago

@christian-rauch wrote

Previously, because the Azure Kinect is using a factory calibration, I incorrectly assumed that the published images are already calibrated. This is not the case. The raw driver publishes the undistorted images. Therefore you have to rectify the images using either the factory or a manual calibration. Only after this can you use the K from the CameraInfo message.

Not exactly.

"The raw driver publishes the undistorted images." should be "The raw driver publishes the distorted images".


"Therefore you have to rectify the images using either the factory or a manual calibration. Only after this can you use the K from the CameraInfo message.

True for current implementation but not in general.

After rectifying images (e.g. through image_proc) you should be using P matrix, or really its K' submatrix (because translation subvector is 0 and rotation matrix R is identity).

Using K works with current implementation just because this is what it publishes as K' submatrix of P in CameraInfo and what is used for undistortion.


But current implementation factory calibration after undistortion will also have side effect that @KamilDre mentioned.

The "semi-random" viewpoint of undistorted image that looks "zoomed in"

bmegli commented 2 years ago

Can we make it better?

Yes.

From http://wiki.ros.org/image_pipeline/CameraInfo 4.1

[...] since ROS Electric, the camera_calibration package does monocular calibration to get K' using OpenCV's getOptimalNewCameraMatrix() function with argument 'alpha'=0.0 that cause K' ≠ K.

We may have uniform behavior for factory calibration and ROS camera_calibration pipeline.

To do it, factory calibration should use OpenCV getOptimalNewCameraMatrix() to determine K' from intrinsics K and distortion coefficients D. K` should be used to populate P.

bmegli commented 2 years ago

Azure Kinect factory calibration ROS-like behavior

We may have uniform behavior for factory calibration and ROS camera_calibration pipeline.

To do it, factory calibration should use OpenCV getOptimalNewCameraMatrix() to determine K' from intrinsics K and distortion coefficients D. K` should be used to populate P.

The results would be....

RGB

Notice semi-random like viewpoint for current undistorted image border that is result of reusing intrinsics K as K` and distortion D vector.

By semi-random I mean that it is neither of:

RGB
image_raw
(distorted)
rgb_image_raw_distorted
image_rect_color
(undistorted old)
rgb_undistorted
image_rect_color
(undistorted new)
rgb_undistorted_optimal

Depth NFOV

Notice semi-random like viewpoint for current undistorted image that is result of reusing intrinsics K as K` and distortion D vector.

Depth NFOV
image_raw
(distorted)
depth_nfov_distorted
image_rect_raw
(undistorted old)
depth_nfov_undistorted
image_rect_raw
(undistorted new)
depth_nfov_undistorted_optimal

Depth WFOV binned 2x2

Notice semi-random like viewpoint for current undistorted image that is result of reusing intrinsics K as K` and distortion D vector.

Now in WFOV my desk is also visible (straight line after OpenCVs "optimal undistortion" at bottom)

Depth WFOV binned 2x2
image_raw
(distorted)
depth_wfov_distorted
image_rect_raw
(undistorted old)
depth_wfov_undistorted
image_rect_raw
(undistorted new)
depth_wfov_undistorted_optimal
bmegli commented 2 years ago

We are already using the unified factory calibration and ROS camera_calibration behavior at Extend Robotics.

If this is something beneficial for the community we may open pull request.

christian-rauch commented 2 years ago

@christian-rauch wrote

"The raw driver publishes the undistorted images." should be "The raw driver publishes the distorted images".

Ups, I guess I meant un-rectified.

After rectifying images (e.g. through image_proc) you should be using P matrix, or really its K' submatrix (because translation subvector is 0 and rotation matrix R is identity).

Using K works with current implementation just because this is what it publishes as K' submatrix of P in CameraInfo and what is used for undistortion.

You are right. According to the CameraInfo documentation, K:

# Projects 3D points in the camera coordinate frame to 2D pixel
# coordinates using the focal lengths (fx, fy) and principal point
# (cx, cy).

and P:

# By convention, this matrix specifies the intrinsic (camera) matrix
#  of the processed (rectified) image. That is, the left 3x3 portion
#  is the normal camera intrinsic matrix for the rectified image.

So have to use K on the raw image and P on the rectified (undistorted) image.

But current implementation factory calibration after undistortion will also have side effect that @KamilDre mentioned.

Can you point me to where those side-effects were mentioned? I cannot recall which side-effect were caused by undistortion/rectification.

The "semi-random" viewpoint of undistorted image that looks "zoomed in"

What do you mean by that? Do you mean by "zoomed in" that it will not show the black border?

christian-rauch commented 2 years ago

Thanks for the comparison. It is right that the factory calibration produces an image with black borders. But, this does not matter for the projection to 3D. Both types of calibration should project correctly into the 3D space. However, if you rely on a "perfect" colour image without artefacts, e.g. if you process those images for edge- or keypoint detection, then you should use calibration values that do not create those artefacts, e.g. like those obtain with manual calibration. It makes no difference if K or P (or K') is used. This behaviour is the same for both.

We are already using the unified factory calibration and ROS camera_calibration behaviour at Extend Robotics.

What do you mean with the "unified factory calibration"? As far as I know, you cannot change the factory calibration on the device. The only way to "override" it is to recalibrate the camera using the new camera info manager and then publish those intrinsics instead.

bmegli commented 2 years ago

But current implementation factory calibration after undistortion will also have side effect that @KamilDre mentioned.

Can you point me to where those side-effects were mentioned? I cannot recall which side-effect were caused by undistortion/rectification.

He was surprised by the visible area of rectified NFOV depth.

The choice of new camera matrix when undistoting is arbitrary e.g.

  1. Azure-Kinect-Sensor-SDK undistort example
  2. ROS camera_calibration package for monocular camera uses OpenCVs getOptimalNewCameraMatrix with alpha==0
bmegli commented 2 years ago

K == K'

Choosing K' = K is one of such arbitrary choices.

In that case what is covered by undistorted image is determined by distortion coefficients.

The "semi-random" viewpoint of undistorted image that looks "zoomed in"

What do you mean by that? Do you mean by "zoomed in" that it will not show the black border?

This is what I call "semi-random".

When Keeping K'==K and same undistorted image resolution:

The result varies, may be seen in images in my post above:

bmegli commented 2 years ago

Other choices for K'

There are two optimal choices (depending what criteria you optimize)

  1. All pixels in undistorted image should be valid
  2. All pixels from original image should be covered in undistorted image

The first represents OpenCV's getOptimalNewCameraMatrix with alpha == 0, the second with alpha ==1

ROS camera_calibration package chooses approach 1 when calibrating monocular camera

Azure-Kinect-Sensor-SDK undistort example seems to be using approach 2


The images I provided:

bmegli commented 2 years ago

We are already using the unified factory calibration and ROS camera_calibration behaviour at Extend Robotics.

What do you mean with the "unified factory calibration"? As far as I know, you cannot change the factory calibration on the device. The only way to "override" it is to recalibrate the camera using the new camera info manager and then publish those intrinsics instead.

I only mean that CameraInfo published from factory calibration may behave the same way as if it were calibrated with ROS camera_calibration

So instead of doing this:

  camera_info.P = {parameters->param.fx,  0.0f,                   parameters->param.cx,   0.0f,
                   0.0f,                  parameters->param.fy,   parameters->param.cy,   0.0f,
                   0.0f,                  0.0,                    1.0f,                   0.0f};

This may be done:

  cv::Matx33d intrinsics = cv::Matx33d(camera_info.K.data());
  cv::Size imageSize(camera_info.width, camera_info.height);

  cv::Matx33d Kp = cv::getOptimalNewCameraMatrix(intrinsics, camera_info.D, imageSize, 0.0);

  camera_info.P = {Kp(0, 0), Kp(0, 1), Kp(0, 2), 0.0,
                   Kp(1, 0), Kp(1, 1), Kp(1, 2), 0.0,
                   Kp(2, 0), Kp(2, 1), Kp(2, 2), 0.0};
bmegli commented 2 years ago

The "semi-random" viewpoint of undistorted image that looks "zoomed in"

What do you mean by that? Do you mean by "zoomed in" that it will not show the black border?

This is what I call "semi-random".

When Keeping K'==K and same undistorted image resolution:

  • the area of original image covered by undistorted image is determined by distortion

The result varies, may be seen in images in my post above:

This is best seen for WFOV in 3D

# driver.launch for comparison
roslaunch azure_kinect_ros_driver driver.launch depth_mode:="WFOV_2X2BINNED" rgb_point_cloud:="false"

point_cloud_driver_launch_WFOV

# from current ROS melodic branch
# notice the amount of discarded "viewpoint"
roslaunch azure_kinect_ros_driver kinect_rgbd.launch color_enabled:="false" depth_mode:="WFOV_2X2BINNED"

point_cloud_rgbd_launch_wfov

# with getOptimalNewCameraMatrix, alpha == 0
# "viewpoint" kept
roslaunch azure_kinect_ros_driver kinect_rgbd.launch color_enabled:="false" depth_mode:="WFOV_2X2BINNED"

point_cloud_rgbd_launch_wfov_optimal_matrix


Interestingly driver.launch looks to be of a bit better quality than rgbd_launch in both cases

christian-rauch commented 2 years ago

I will just comment briefly here, since there are too many distinct posts and I don't have access to the Azure Kinect (for now).

I think the comparison you do above is from a registered and unregistered point cloud. Since the colour and infrared camera centres are shifted, the projected point cloud will also shift depending to which optical frame the data is registered.

For the "black-border vs. no-black-border"... I am sure this does not matter if you are using the coloured points only. Ideally, you want to have as many points as possible and an as large as possible field-of-view. In this case you should use the colour image with the black border as it will contain all of the colour pixels. In any case, all modes should correctly project into 3D.

bmegli commented 2 years ago

I think the comparison you do above is from a registered and unregistered point cloud

It is not.

It is:

This is just to highlight visible area difference (in 3D) depending on chosen K' for rectified data. The same is also visible in the images comparison.

I think the comparison you do above is from a registered and unregistered point cloud. Since the colour and infrared camera centres are shifted, the projected point cloud will also shift depending to which optical frame the data is registered.

Again, this is not registered. This is effect of different undistorted CameraInfo.P matrix creation strategy.

Ideally, you want to have as many points as possible and an as large as possible field-of-view

Then CameraInfo->P matrix should be formed from cv::getOptimalNewCameraMatrix(K, D, alpha) Using K for it limits the visible area severely (for WFOV) when working with rectified data. This only matters for kinect_rgbd.launch pipeline (based on rgbd.launch, based on depth_image_proc)

In any case, all modes should correctly project into 3D

All do.

It is the visible area after rectification that suffers. Especially for WFOV.