Open KamilDre opened 3 years ago
All image_raw
topics contain the raw unrectified/undistorted images. You have to use the camera info to rectify these images. This can be done with the image_proc
nodelets. The kinect_rgbd.launch
does this conveniently for you by loading all nodelets (driver and rectification) into a single nodelet manager.
The intrinsic matrix is used for the rectification. It does not change unless you override it in the driver with a manual calibration.
Thank you for your reply. I understand that the intrinsic matrix does not change on the topic itself. My question is weather it should be the same for a raw and rectified image? If it makes sense that the same relationship between 3D points and pixels holds for a distorted and undistorted image? Below I have pasted in a raw depth image (top) and the corresponding rectified image (bottome), both obtained with the kinect_rgbd.launch file. As you can see, the two images look very different to each other. The rectified image looks as if it was "zoomed in" a bit. This makes me wonder if the same intrinsic matrix is valid for both images?
You cannot use the calibrated intrinsics to project from the rectified image into 3D.
You either can use the calibrated intrinsic on the raw unrectified images, or you have to use the ideal intrinsics on the rectified images. The ideal intrinsics are computed from the technical specification of the sensor resolution and field-of-view.
This clarifies a lot for me, thank you. So when calibrating the extrinsics of an RGB camera using rectified images, I would assume that I would also have to use the ideal intrinsics? And is the reason for this that undistorting an image maps it to an equivelent image that would have been taken by an ideal pinhole camera, and hence the need for the use of the ideal intrinsics?
The intrinsic matrix consists of 5 parameters: a focal length for the x and y direction, , , the skew coefficient between the x and y axis , and the principal point . For a resolution and for an angular field of view , would I calculate the intrinsic parameters as
, , , and ?
To calculate the focal length, I have inverted the equation for calculating the angular field of view from a focal length described in https://www.edmundoptics.co.uk/knowledge-center/application-notes/imaging/understanding-focal-length-and-field-of-view/. As for the principal point, I have assumed that it is at the centre of the image (is this correct?) and that indexing starts from 0 (hence the -1). If this is not correct, I would be grateful if you could correct me or point me to a source where I could learn more about this.
On a side note, I have noticed that the kinect_rgbd.launch file does not create a topic for rectified rgb_to_depth images and hence does not publish them. Is this expected?
For an extrinsic calibration between raw images, I would first do the intrinsic calibration for RGB and IR and then use the rectified images. But you can also do this in one go like for the stereo calibration http://wiki.ros.org/camera_calibration/Tutorials/StereoCalibration.
Your equations look correct. But why do you want to do this manually? You have to compute the ideal instrics for the RGB and IR camera and you have to change this for each of the camera modes.
The kinect_rgbd.launch
does not use the rgb_to_depth
images at all. It only uses the raw images and the depth_to_rgb
registered images.
Essentially, I am building up a 3D reconstruction of an object. To do so, I need to integrate RGBD images taken from multiple viewpoints around the object into a Truncated Signed Distance Function (TSDF) volume. I am using rectified RGB images from the /rgb/image_rect_color topic and rectified aligned depth images from the /depth_to_rgb/image_rect_raw topic. Now, the TSDF algorithm also takes as input the camera intrinsics and extrinsics for each of the views. So the extrinsics I was referring to is not the mapping between the RGB and the IR sensor but between the end-effector of a robot and the RGB sensor.
Now, the images have been rectified, so I wondered if I could still use the intrinsics from the /rgb/camera_info topic. However, as you have told me, I should not be using this, and instead, I should use the ideal intrinsics.
As for the extriniscs, I have calibrated a wrist-mounted RGB camera by taking multiple images of an April Tag from various viewpoints. However, I have found that the calibration is not very good. The most likely reason for this is uncertainty in April tag detection. The algorithm that I am using is from the apriltag_ros package. This package subscribes to the rectified RGB image topic and to the corresponding camera info topic. Hence, I was also wondering if I should use the ideal intrinsics of the RGB sensor here as well.
I am thinking of calculating the intrinsics manually as I predominantly work with 1 RGB resolution and one depth mode.
Now, the images have been rectified, so I wondered if I could still use the intrinsics from the /rgb/camera_info topic. However, as you have told me, I should not be using this, and instead, I should use the ideal intrinsics.
Either this, or better use the raw images and factory calibration directly. Since your algorithm will use the intrinsic anyway, there is no point in rectifying the images first.
As for the extriniscs, I have calibrated a wrist-mounted RGB camera by taking multiple images of an April Tag from various viewpoints.
For AprilTags, you have to use the rectified images. See https://github.com/microsoft/Azure_Kinect_ROS_Driver/pull/200.
I am thinking of calculating the intrinsics manually as I predominantly work with 1 RGB resolution and one depth mode.
If you are using the colour and depth as part of a 3D pipeline that already uses the intrinsic for projection, I recommend to just use the raw images and the factory calibration.
Thanks for all your help and all these tips. I will try these things tomorrow.
Also, just to clarify, factory and manual calibration give the intrinsic parameters and distortion coefficient for a camera. The intrinsic parameters capture the relationship between pixels in the raw images that were used as part of the calibration and 3D points. On the other hand, the distortion coefficients can be used to undistort/rectify the raw images, effectively changing them to images that would have been captured by an ideal pinhole camera with the same angular field of view, image resolution and extrinsics. Hence, it is better to use raw images and factory or manual calibration in 3D pipelines. This way, we avoid loss of data and artefacts being introduced by the undistortion process. Is my understanding of all of this correct? If so, what is the reason for rectifying images in the first place, apart from making them look more natural? Is there any source that you could recommend where I could learn a bit more about all of this?
Also, from your personal experience, would you consider the factory calibration of the Kinect to be accurate?
You calibrate and rectify images to get correct the projection and the ideal view. Manufacturing inaccuracies and optics change how the image is projected compared to the ideal projection. For a 3D projection, you can either use the raw images and the "corrected" projection matrix, or the "corrected" images and the ideal projection matrix. This should give you the same results. If you want to read about it, I can recommend "Computer Vision: A Modern Approach: International Edition" by David A. Forsyth and Jean Ponce.
Rectifying images directly makes sense if you need the ideal representation in 2D. For example, if you want to match a template or detect parallel and straight lines. https://github.com/microsoft/Azure_Kinect_ROS_Driver/pull/200#issuecomment-883406304 gives you a good example of the outcome of rectification. The edges of the light on the ceiling in the raw image are not straight. The rectified images have much more straight lines. Hence the term "rectification".
The example will also show you that the factory calibration gives "straighter lines" than my manual calibration with a small A3 chessboard pattern. You might get better manual calibration with a larger calibration pattern.
Thanks for the source! From #200 it is very clear that rectification makes lines more straight. Since AprilTag pose estimation relies on template matching, then it explains why it requires rectified images. But in this case, I would guess that it also requires the ideal projection matrix to be very accurate.
For the 3D projection with raw images, when you wrote the "corrected" projection matrix, are you implying that the intrinsic matrix from the factory calibration must be "corrected" somehow to account for the fact that the Kinect is not an ideal pinhole camera? If so, how is this correction carried out? I am asking because I wonder if the intrinsic matrix that is the output of a standard calibration procedure can be used for 3D projection of an image that was not generated by an ideal pinhole camera, as 3D projection assumes this camera model.
For the 3D projection with raw images, when you wrote the "corrected" projection matrix, are you implying that the intrinsic matrix from the factory calibration must be "corrected" somehow to account for the fact that the Kinect is not an ideal pinhole camera?
No, the calibrated intrinsic are already "corrected". This is easy visible by comparing the calibrated image centre with the image size.
I wonder if the intrinsic matrix that is the output of a standard calibration procedure can be used for 3D projection of an image that was not generated by an ideal pinhole camera, as 3D projection assumes this camera model.
Yes, that is what you do if you just use the factory calibration on the raw image.
Thanks a lot! This all makes sense now! I will now rename this question to reflect how our conversation has unfolded so that others can also learn from your answers.
Over the weekend, I have been thinking about this problem more and more. From my understanding, the intrinsics matrix does not account for image distortions caused by the camera lens, as a pinhole model assumes that it is not present. If this is true, would it not be more accurate to use the ideal intrinsic matrix and a rectified image than a raw image and factory calibration to do a 3D projection of a depth image? I think this can be true as rectification accounts for image distortion.
Edit: According to slide 4 of https://www.uio.no/studier/emner/matnat/its/nedlagte-emner/UNIK4690/v17/forelesninger/lecture_5_3_camera_calibration.pdf, the intrinsic matrix describes the correspondence between points in the world and points in a undistorted image. This is also stated in https://www.imatest.com/support/docs/pre-5-2/geometric-calibration-deprecated/projective-camera/ . This appears to contradict using the ideal intrinsic matrix with rectified images?
Sorry, I mixed some things together here. The camera meta data stored in sensor_msgs/CameraInfo
contains the intrinsics (K
) for the projection and the distortion parameters (distortion_model
+ D
). If you only need a rectified 2D image you have to use the distortion parameters and the intrinsic. E.g., if you use a lens that changes the field-of-view, you definitely have to update the intrinsic matrix K
to account for the new 3D projection. However, you will not be able to apply a non-linear (e.g. polynomial) mapping to undistort an image, just by using the intrinsic matrix. I think P
is only relevant in the stereo case, and at least on /depth/camera_info
it is identical to K
.
The RectifyNodelet
and the PinholeCameraModel
class gives you a good overview of how the CameraInfo
message is used in the corresponding OpenCV API. The pure image distortion is done in PinholeCameraModel::rectifyImage
with a 2D mapping between raw and undistorted image, which is created by cv::initUndistortRectifyMap
. This is using K
and D
o create the 2D mapping.
Hi @christian-rauch, a follow-up on the question of which image the K
matrix should be applied to. Backtracking from the kinect_rgbd.launch
, in particular for the case when depth_registered_processing
is True (hence the /depth_to_rgb
topic provided by the drivers is used), I got convinced that rgbd_launch
actually applies K
to the rectified images. In particular:
depth_registered.launch.xml
, included by processing.launch.xml
here.depth_registered.launch.xml
launches an image_proc/rectify
node that publishes the rectified version of /depth_to_rgb
. To my understanding, this rectification uses the CameraInfo
from /depth_to_rgb
, which can be checked here and here. This CameraInfo
coincides with the /rgb/camera_info
(as can be checked by echoing the ROS topics), which makes sense given that /depth_to_rgb
is the depth image "as seen by the RGB camera".depth_registered.launch.xml
also launches a depth_image_proc/point_cloud_xyzrgb
node , which subscribes to the rectified versions of /rgb
and /depth_to_rgb
and forms the point cloud. This point clouds seems to be obtained by simply using the K
from the /rgb/camera_info
and backprojecting the rectified /rgb
and /depth_to_rgb
, as can be seen here, here, here and here.Could you confirm that my interpretation is correct? On a side note, this seems not to be in line with the "Derivation diagram" shown in the image_pipeline
package, which seems to suggest that K
should be applied on the raw image, as you were mentioning.
If you have a distorted image, e.g. when using a large FoV or due to other lense impurities, you have to rectify the image first using the distortion coefficients in D
and the projection in K
. In most cases, this will be a non-linear mapping from the distorted "real" to the undistorted "normal" image. After this, you can apply the projection using K
. If your distortion is linear, you can just incorporate this into your projection matrix. If you have non-linear distortions, you cannot just apply the K
from the ROS message (I updated my previous comment to avoid confusion).
For the details on the ROS side, start with the PinholeCameraModel
. It has all the methods defined. See my comment above on how the corresponding OpenCV functions are used.
The Derivation diagram maybe be confusing since it is showing the image formation process that involves the 3D to 2D projection process (see also the "Projection onto the original image"). There your first do the ray casting from a 3D point to the camera and then the "ray" is distorted by the lense or other effects. If you want to do the 2D to 3D projection, you have to reverse that process of course.
Thank you for your prompt reply. I am familiar with the OpenCV pipeline for undistortion and I am already using image_geometry.PinholeCameraModel
in my ROS interfaces. These two things give me what seem to be "correct" point clouds (up to color leak on the object boundaries, but this is likely another issue from what I've read). My question comes from the fact that by reading this thread (e.g., this answer) I understood that your point was that the intrinsic matrix K
from the /rgb/camera_info
topic should not be applied to the rectified images, which is instead what happens in rgbd_launch
(and I also agree that should be the case). In short, I would like to make sure that the right procedure to obtain a point cloud from /rgb
, /depth_to_rgb
and /rgb/camera_info
is the following:
/rgb
and /depth_to_rgb
by using the D
coefficients in /rgb/camera_info
(e.g., via OpenCV).K
matrix from /rgb/camera_info
as intrinsic matrix in pinhole camera equation.Previously, because the Azure Kinect is using a factory calibration, I incorrectly assumed that the published images are already calibrated. This is not the case. The raw driver publishes the undistorted images. Therefore you have to rectify the images using either the factory or a manual calibration. Only after this can you use the K
from the CameraInfo message.
Your procedure is correct. You can do these steps manually or use the PinholeCameraModel
class for rectification and projection. Though, the projection methods only work with individual pixels.
Perfect, thank you.
@KamilDre wrote:
The rectified image looks as if it was "zoomed in" a bit. This makes me wonder if the same intrinsic matrix is valid for both images?
Reusing CameraInfo
matrix K for P submatrix K' is valid but will result in "semi-random" viewpoint as consequence of distortion coefficients.
See http://wiki.ros.org/image_pipeline/CameraInfo to understand how undistortion is performed.
@christian-rauch wrote
Previously, because the Azure Kinect is using a factory calibration, I incorrectly assumed that the published images are already calibrated. This is not the case. The raw driver publishes the undistorted images. Therefore you have to rectify the images using either the factory or a manual calibration. Only after this can you use the K from the CameraInfo message.
Not exactly.
"The raw driver publishes the undistorted images." should be "The raw driver publishes the distorted images".
"Therefore you have to rectify the images using either the factory or a manual calibration. Only after this can you use the K from the CameraInfo message.
True for current implementation but not in general.
After rectifying images (e.g. through image_proc
) you should be using P matrix, or really its K' submatrix (because translation subvector is 0 and rotation matrix R is identity).
Using K works with current implementation just because this is what it publishes as K' submatrix of P in CameraInfo and what is used for undistortion.
But current implementation factory calibration after undistortion will also have side effect that @KamilDre mentioned.
The "semi-random" viewpoint of undistorted image that looks "zoomed in"
Can we make it better?
Yes.
From http://wiki.ros.org/image_pipeline/CameraInfo 4.1
[...] since ROS Electric, the camera_calibration package does monocular calibration to get K' using OpenCV's getOptimalNewCameraMatrix() function with argument 'alpha'=0.0 that cause K' ≠ K.
We may have uniform behavior for factory calibration and ROS camera_calibration pipeline.
To do it, factory calibration should use OpenCV getOptimalNewCameraMatrix()
to determine K' from intrinsics K and distortion coefficients D. K` should be used to populate P.
We may have uniform behavior for factory calibration and ROS camera_calibration pipeline.
To do it, factory calibration should use OpenCV getOptimalNewCameraMatrix() to determine K' from intrinsics K and distortion coefficients D. K` should be used to populate P.
The results would be....
Notice semi-random like viewpoint for current undistorted image border that is result of reusing intrinsics K as K` and distortion D vector.
By semi-random I mean that it is neither of:
RGB | |
---|---|
image_raw (distorted) |
|
image_rect_color (undistorted old) |
|
image_rect_color (undistorted new) |
Notice semi-random like viewpoint for current undistorted image that is result of reusing intrinsics K as K` and distortion D vector.
Depth NFOV | |
---|---|
image_raw (distorted) |
|
image_rect_raw (undistorted old) |
|
image_rect_raw (undistorted new) |
Notice semi-random like viewpoint for current undistorted image that is result of reusing intrinsics K as K` and distortion D vector.
Now in WFOV my desk is also visible (straight line after OpenCVs "optimal undistortion" at bottom)
Depth WFOV binned 2x2 | |
---|---|
image_raw (distorted) |
|
image_rect_raw (undistorted old) |
|
image_rect_raw (undistorted new) |
We are already using the unified factory calibration and ROS camera_calibration behavior at Extend Robotics.
If this is something beneficial for the community we may open pull request.
@christian-rauch wrote
"The raw driver publishes the undistorted images." should be "The raw driver publishes the distorted images".
Ups, I guess I meant un-rectified.
After rectifying images (e.g. through
image_proc
) you should be using P matrix, or really its K' submatrix (because translation subvector is 0 and rotation matrix R is identity).Using K works with current implementation just because this is what it publishes as K' submatrix of P in CameraInfo and what is used for undistortion.
You are right. According to the CameraInfo
documentation, K
:
# Projects 3D points in the camera coordinate frame to 2D pixel # coordinates using the focal lengths (fx, fy) and principal point # (cx, cy).
and P
:
# By convention, this matrix specifies the intrinsic (camera) matrix # of the processed (rectified) image. That is, the left 3x3 portion # is the normal camera intrinsic matrix for the rectified image.
So have to use K
on the raw image and P
on the rectified (undistorted) image.
But current implementation factory calibration after undistortion will also have side effect that @KamilDre mentioned.
Can you point me to where those side-effects were mentioned? I cannot recall which side-effect were caused by undistortion/rectification.
The "semi-random" viewpoint of undistorted image that looks "zoomed in"
What do you mean by that? Do you mean by "zoomed in" that it will not show the black border?
Thanks for the comparison. It is right that the factory calibration produces an image with black borders. But, this does not matter for the projection to 3D. Both types of calibration should project correctly into the 3D space. However, if you rely on a "perfect" colour image without artefacts, e.g. if you process those images for edge- or keypoint detection, then you should use calibration values that do not create those artefacts, e.g. like those obtain with manual calibration. It makes no difference if K or P (or K') is used. This behaviour is the same for both.
We are already using the unified factory calibration and ROS camera_calibration behaviour at Extend Robotics.
What do you mean with the "unified factory calibration"? As far as I know, you cannot change the factory calibration on the device. The only way to "override" it is to recalibrate the camera using the new camera info manager and then publish those intrinsics instead.
But current implementation factory calibration after undistortion will also have side effect that @KamilDre mentioned.
Can you point me to where those side-effects were mentioned? I cannot recall which side-effect were caused by undistortion/rectification.
He was surprised by the visible area of rectified NFOV depth.
The choice of new camera matrix when undistoting is arbitrary e.g.
getOptimalNewCameraMatrix
with alpha == 1
and centerPrincipalPoint == falsegetOptimalNewCameraMatrix
with alpha==0
since ROS Electric, the camera_calibration package does monocular calibration to get K' using OpenCV's getOptimalNewCameraMatrix() function with argument 'alpha'=0.0 that cause K' ≠ K.
Choosing K' = K is one of such arbitrary choices.
In that case what is covered by undistorted image is determined by distortion coefficients.
The "semi-random" viewpoint of undistorted image that looks "zoomed in"
What do you mean by that? Do you mean by "zoomed in" that it will not show the black border?
This is what I call "semi-random".
When Keeping K'==K and same undistorted image resolution:
The result varies, may be seen in images in my post above:
There are two optimal choices (depending what criteria you optimize)
The first represents OpenCV's getOptimalNewCameraMatrix
with alpha == 0
, the second with alpha ==1
ROS camera_calibration
package chooses approach 1 when calibrating monocular camera
Azure-Kinect-Sensor-SDK undistort example seems to be using approach 2
The images I provided:
getOptimalNewCameraMatrix
with alpha==0
strategyWe are already using the unified factory calibration and ROS camera_calibration behaviour at Extend Robotics.
What do you mean with the "unified factory calibration"? As far as I know, you cannot change the factory calibration on the device. The only way to "override" it is to recalibrate the camera using the new camera info manager and then publish those intrinsics instead.
I only mean that CameraInfo published from factory calibration may behave the same way as if it were calibrated with ROS camera_calibration
alpha == 0
So instead of doing this:
camera_info.P = {parameters->param.fx, 0.0f, parameters->param.cx, 0.0f,
0.0f, parameters->param.fy, parameters->param.cy, 0.0f,
0.0f, 0.0, 1.0f, 0.0f};
This may be done:
cv::Matx33d intrinsics = cv::Matx33d(camera_info.K.data());
cv::Size imageSize(camera_info.width, camera_info.height);
cv::Matx33d Kp = cv::getOptimalNewCameraMatrix(intrinsics, camera_info.D, imageSize, 0.0);
camera_info.P = {Kp(0, 0), Kp(0, 1), Kp(0, 2), 0.0,
Kp(1, 0), Kp(1, 1), Kp(1, 2), 0.0,
Kp(2, 0), Kp(2, 1), Kp(2, 2), 0.0};
The "semi-random" viewpoint of undistorted image that looks "zoomed in"
What do you mean by that? Do you mean by "zoomed in" that it will not show the black border?
This is what I call "semi-random".
When Keeping K'==K and same undistorted image resolution:
- the area of original image covered by undistorted image is determined by distortion
The result varies, may be seen in images in my post above:
- https://github.com/microsoft/Azure_Kinect_ROS_Driver/issues/212#issuecomment-1180714989
- reasonable for RGB but unnecessary extra black border (extra, like always carrying black pixels)
- somewhat acceptable for NFOV depth
- huge loss of information for WFOV depth
This is best seen for WFOV in 3D
# driver.launch for comparison
roslaunch azure_kinect_ros_driver driver.launch depth_mode:="WFOV_2X2BINNED" rgb_point_cloud:="false"
# from current ROS melodic branch
# notice the amount of discarded "viewpoint"
roslaunch azure_kinect_ros_driver kinect_rgbd.launch color_enabled:="false" depth_mode:="WFOV_2X2BINNED"
# with getOptimalNewCameraMatrix, alpha == 0
# "viewpoint" kept
roslaunch azure_kinect_ros_driver kinect_rgbd.launch color_enabled:="false" depth_mode:="WFOV_2X2BINNED"
Interestingly driver.launch
looks to be of a bit better quality than rgbd_launch
in both cases
I will just comment briefly here, since there are too many distinct posts and I don't have access to the Azure Kinect (for now).
I think the comparison you do above is from a registered and unregistered point cloud. Since the colour and infrared camera centres are shifted, the projected point cloud will also shift depending to which optical frame the data is registered.
For the "black-border vs. no-black-border"... I am sure this does not matter if you are using the coloured points only. Ideally, you want to have as many points as possible and an as large as possible field-of-view. In this case you should use the colour image with the black border as it will contain all of the colour pixels. In any case, all modes should correctly project into 3D.
I think the comparison you do above is from a registered and unregistered point cloud
It is not.
It is:
driver.launch
kinect_rgbd.launch
melodic master ("K==K'")
rgbd.launch
which in turn uses depth_image_proc
/depth/image_rect_raw
topickinect_rgbd.launch
with "K' == getOptimalNewCameraMatrix(K, d, alpha == 0)"
This is just to highlight visible area difference (in 3D) depending on chosen K' for rectified data. The same is also visible in the images comparison.
I think the comparison you do above is from a registered and unregistered point cloud. Since the colour and infrared camera centres are shifted, the projected point cloud will also shift depending to which optical frame the data is registered.
Again, this is not registered. This is effect of different undistorted CameraInfo.P matrix creation strategy.
cv::getOptimalNewCameraMatrix(K, D, alpha)
: we control what is in the visible areaIdeally, you want to have as many points as possible and an as large as possible field-of-view
Then CameraInfo->P matrix should be formed from cv::getOptimalNewCameraMatrix(K, D, alpha)
Using K for it limits the visible area severely (for WFOV) when working with rectified data.
This only matters for kinect_rgbd.launch
pipeline (based on rgbd.launch
, based on depth_image_proc
)
In any case, all modes should correctly project into 3D
All do.
It is the visible area after rectification that suffers. Especially for WFOV.
Apologies that I have logged this as a feature request - I am very new to git and did not know how to ask a question directly. However, I thing that some things could be clarified in the usage.md file to makes things more clear for others.
My question is:
In the usege.md file, it is specified that images on the following topics are not undistorter :
My question is, are images on the following topics undistorter since there is no comment about them:
If so, it would be nice to clarify this in the usage.md file. And if they are not undistorter, should I use the corresponding camera info topic to undistort them using image_proc?
Also, does the intrinsic matrix of the camera change after rectification?