[Calibration] Overlay point cloud from two cameras

guanming001 commented 4 years ago

Hi

I have tried to modify the green_screen example to allow the display of point clouds obtained from two sync cameras but couldn't get the point cloud to overlay correctly.

Is there be any sample code on overlaying the point cloud from two cameras?

Below is the modified main.cpp for reference and the util.h is adapted from AzureKinectSample

Modified green screen code

``` // Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. #include #include #include #include #include #include #include #include #include #include #include #include #include "util.h" // Adapted from AzureKinectSample using std::cerr; using std::cout; using std::endl; using std::vector; #include "transformation.h" #include "MultiDeviceCapturer.h" // Allowing at least 160 microseconds between depth cameras should ensure they do not interfere with one another. constexpr uint32_t MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC = 160; // ideally, we could generalize this to many OpenCV types static cv::Mat color_to_opencv(const k4a::image &im); static cv::Mat depth_to_opencv(const k4a::image &im); static cv::Matx33f calibration_to_color_camera_matrix(const k4a::calibration &cal); static Transformation get_depth_to_color_transformation_from_calibration(const k4a::calibration &cal); static k4a::calibration construct_device_to_device_calibration(const k4a::calibration &main_cal, const k4a::calibration &secondary_cal, const Transformation &secondary_to_main); static vector calibration_to_color_camera_dist_coeffs(const k4a::calibration &cal); static bool find_chessboard_corners_helper(const cv::Mat &main_color_image, const cv::Mat &secondary_color_image, const cv::Size &chessboard_pattern, vector &main_chessboard_corners, vector &secondary_chessboard_corners); static Transformation stereo_calibration(const k4a::calibration &main_calib, const k4a::calibration &secondary_calib, const vector> &main_chessboard_corners_list, const vector> &secondary_chessboard_corners_list, const cv::Size &image_size, const cv::Size &chessboard_pattern, float chessboard_square_length); static k4a_device_configuration_t get_master_config(); static k4a_device_configuration_t get_subordinate_config(); static Transformation calibrate_devices(MultiDeviceCapturer &capturer, const k4a_device_configuration_t &main_config, const k4a_device_configuration_t &secondary_config, const cv::Size &chessboard_pattern, float chessboard_square_length, double calibration_timeout); static k4a::image create_depth_image_like(const k4a::image &im); // For plotting point cloud k4a::image main_xyz_image; k4a::image secondary_xyz_image; cv::Mat main_xyz; cv::Mat secondary_xyz; // Viewer cv::viz::Viz3d viewer; int main(int argc, char **argv) { float chessboard_square_length = 0.; // must be included in the input params int32_t color_exposure_usec = 8000; // somewhat reasonable default exposure time int32_t powerline_freq = 2; // default to a 60 Hz powerline cv::Size chessboard_pattern(0, 0); // height, width. Both need to be set. uint16_t depth_threshold = 1000; // default to 1 meter size_t num_devices = 0; double calibration_timeout = 60.0; // default to timing out after 60s of trying to get calibrated double greenscreen_duration = std::numeric_limits::max(); // run forever vector device_indices{ 0 }; // Set up a MultiDeviceCapturer to handle getting many synchronous captures // Note that the order of indices in device_indices is not necessarily // preserved because MultiDeviceCapturer tries to find the master device based // on which one has sync out plugged in. Start with just { 0 }, and add // another if needed if (argc < 5) { cout << "Usage: green_screen " "[depth-threshold-mm (default 1000)] [color-exposure-time-usec (default 8000)] " "[powerline-frequency-mode (default 2 for 60 Hz)] [calibration-timeout-sec (default 60)]" "[greenscreen-duration-sec (default infinity- run forever)]" << endl; cerr << "Not enough arguments!\n"; exit(1); } else { num_devices = static_cast(atoi(argv[1])); if (num_devices > k4a::device::get_installed_count()) { cerr << "Not enough cameras plugged in!\n"; exit(1); } chessboard_pattern.height = atoi(argv[2]); chessboard_pattern.width = atoi(argv[3]); chessboard_square_length = static_cast(atof(argv[4])); if (argc > 5) { depth_threshold = static_cast(atoi(argv[5])); if (argc > 6) { color_exposure_usec = atoi(argv[6]); if (argc > 7) { powerline_freq = atoi(argv[7]); if (argc > 8) { calibration_timeout = atof(argv[8]); if (argc > 9) { greenscreen_duration = atof(argv[9]); } } } } } } if (num_devices != 2 && num_devices != 1) { cerr << "Invalid choice for number of devices!\n"; exit(1); } else if (num_devices == 2) { device_indices.emplace_back(1); // now device indices are { 0, 1 } } if (chessboard_pattern.height == 0) { cerr << "Chessboard height is not properly set!\n"; exit(1); } if (chessboard_pattern.width == 0) { cerr << "Chessboard height is not properly set!\n"; exit(1); } if (chessboard_square_length == 0.) { cerr << "Chessboard square size is not properly set!\n"; exit(1); } cout << "Chessboard height: " << chessboard_pattern.height << ". Chessboard width: " << chessboard_pattern.width << ". Chessboard square length: " << chessboard_square_length << endl; cout << "Depth threshold: : " << depth_threshold << ". Color exposure time: " << color_exposure_usec << ". Powerline frequency mode: " << powerline_freq << endl; MultiDeviceCapturer capturer(device_indices, color_exposure_usec, powerline_freq); // Create configurations for devices k4a_device_configuration_t main_config = get_master_config(); if (num_devices == 1) // no need to have a master cable if it's standalone { main_config.wired_sync_mode = K4A_WIRED_SYNC_MODE_STANDALONE; } k4a_device_configuration_t secondary_config = get_subordinate_config(); // Construct all the things that we'll need whether or not we are running with 1 or 2 cameras k4a::calibration main_calibration = capturer.get_master_device().get_calibration(main_config.depth_mode, main_config.color_resolution); // Set up a transformation. DO THIS OUTSIDE OF YOUR MAIN LOOP! Constructing transformations involves time-intensive // hardware setup and should not change once you have a rigid setup, so only call it once or it will run very // slowly. k4a::transformation main_depth_to_main_color(main_calibration); capturer.start_devices(main_config, secondary_config); // get an image to be the background vector background_captures = capturer.get_synchronized_captures(secondary_config); cv::Mat background_image = color_to_opencv(background_captures[0].get_color_image()); cv::Mat output_image = background_image.clone(); // allocated outside the loop to avoid re-creating every time if (num_devices == 1) { std::chrono::time_point start_time = std::chrono::system_clock::now(); while (std::chrono::duration(std::chrono::system_clock::now() - start_time).count() < greenscreen_duration) { vector captures; // secondary_config isn't actually used here because there's no secondary device but the function needs it captures = capturer.get_synchronized_captures(secondary_config, true); k4a::image main_color_image = captures[0].get_color_image(); k4a::image main_depth_image = captures[0].get_depth_image(); // let's green screen out things that are far away. // first: let's get the main depth image into the color camera space k4a::image main_depth_in_main_color = create_depth_image_like(main_color_image); main_depth_to_main_color.depth_image_to_color_camera(main_depth_image, &main_depth_in_main_color); cv::Mat cv_main_depth_in_main_color = depth_to_opencv(main_depth_in_main_color); cv::Mat cv_main_color_image = color_to_opencv(main_color_image); // single-camera case cv::Mat within_threshold_range = (cv_main_depth_in_main_color != 0) & (cv_main_depth_in_main_color < depth_threshold); // show the close details cv_main_color_image.copyTo(output_image, within_threshold_range); // hide the rest with the background image background_image.copyTo(output_image, ~within_threshold_range); cv::imshow("Green Screen", output_image); cv::waitKey(1); } } else if (num_devices == 2) { // This wraps all the device-to-device details Transformation tr_secondary_color_to_main_color = calibrate_devices(capturer, main_config, secondary_config, chessboard_pattern, chessboard_square_length, calibration_timeout); k4a::calibration secondary_calibration = capturer.get_subordinate_device_by_index(0).get_calibration(secondary_config.depth_mode, secondary_config.color_resolution); // Get the transformation from secondary depth to secondary color using its calibration object Transformation tr_secondary_depth_to_secondary_color = get_depth_to_color_transformation_from_calibration( secondary_calibration); // We now have the secondary depth to secondary color transform. We also have the transformation from the // secondary color perspective to the main color perspective from the calibration earlier. Now let's compose the // depth secondary -> color secondary, color secondary -> color main into depth secondary -> color main Transformation tr_secondary_depth_to_main_color = tr_secondary_depth_to_secondary_color.compose_with( tr_secondary_color_to_main_color); // Construct a new calibration object to transform from the secondary depth camera to the main color camera k4a::calibration secondary_depth_to_main_color_cal = construct_device_to_device_calibration(main_calibration, secondary_calibration, tr_secondary_depth_to_main_color); k4a::transformation secondary_depth_to_main_color(secondary_depth_to_main_color_cal); // Pointcloud viewer const cv::String window_name = "point cloud"; viewer = cv::viz::Viz3d(window_name); // Show Coordinate System Origin constexpr double scale = 100.0; viewer.showWidget("origin", cv::viz::WCameraPosition(scale)); std::chrono::time_point start_time = std::chrono::system_clock::now(); while (std::chrono::duration(std::chrono::system_clock::now() - start_time).count() < greenscreen_duration) { vector captures; captures = capturer.get_synchronized_captures(secondary_config, true); k4a::image main_color_image = captures[0].get_color_image(); k4a::image main_depth_image = captures[0].get_depth_image(); // let's green screen out things that are far away. // first: let's get the main depth image into the color camera space k4a::image main_depth_in_main_color = create_depth_image_like(main_color_image); main_depth_to_main_color.depth_image_to_color_camera(main_depth_image, &main_depth_in_main_color); cv::Mat cv_main_depth_in_main_color = depth_to_opencv(main_depth_in_main_color); cv::Mat cv_main_color_image = color_to_opencv(main_color_image); k4a::image secondary_depth_image = captures[1].get_depth_image(); k4a::image secondary_color_image = captures[1].get_color_image(); // Get the depth image in the main color perspective k4a::image secondary_depth_in_main_color = create_depth_image_like(main_color_image); secondary_depth_to_main_color.depth_image_to_color_camera(secondary_depth_image, &secondary_depth_in_main_color); cv::Mat cv_secondary_depth_in_main_color = depth_to_opencv(secondary_depth_in_main_color); cv::Mat cv_secondary_color_image = color_to_opencv(secondary_color_image); // Transform Depth Image to Point Cloud main_xyz_image = main_depth_to_main_color.depth_image_to_point_cloud( main_depth_in_main_color, K4A_CALIBRATION_TYPE_COLOR ); secondary_xyz_image = secondary_depth_to_main_color.depth_image_to_point_cloud(secondary_depth_in_main_color, K4A_CALIBRATION_TYPE_COLOR); // Get cv::Mat from k4a::image main_xyz = k4a::get_mat(main_xyz_image); secondary_xyz = k4a::get_mat(secondary_xyz_image); // Create Point Cloud Widget //cv::viz::WCloud cloud = cv::viz::WCloud(main_xyz, cv_main_color_image); cv::viz::WCloud cloud = cv::viz::WCloud(main_xyz, cv::viz::Color::blue()); cv::viz::WCloud cloud2 = cv::viz::WCloud(secondary_xyz, cv::viz::Color::red()); // Show Widget viewer.showWidget("cloud", cloud); viewer.showWidget("cloud2", cloud2); viewer.spinOnce(); //// Now it's time to actually construct the green screen. Where the depth is 0, the camera doesn't know how //// far away the object is because it didn't get a response at that point. That's where we'll try to fill in //// the gaps with the other camera. //cv::Mat main_valid_mask = cv_main_depth_in_main_color != 0; //cv::Mat secondary_valid_mask = cv_secondary_depth_in_main_color != 0; //// build depth mask. If the main camera depth for a pixel is valid and the depth is within the threshold, //// then set the mask to display that pixel. If the main camera depth for a pixel is invalid but the //// secondary depth for a pixel is valid and within the threshold, then set the mask to display that pixel. //cv::Mat within_threshold_range = (main_valid_mask & (cv_main_depth_in_main_color < depth_threshold)) | // (~main_valid_mask & secondary_valid_mask & // (cv_secondary_depth_in_main_color < depth_threshold)); //// copy main color image to output image only where the mask within_threshold_range is true //cv_main_color_image.copyTo(output_image, within_threshold_range); //// fill the rest with the background image //background_image.copyTo(output_image, ~within_threshold_range); //cv::imshow("Green Screen", output_image); cv::waitKey(1); } // Close Viewer viewer.close(); } else { cerr << "Invalid number of devices!" << endl; exit(1); } return 0; } static cv::Mat color_to_opencv(const k4a::image &im) { cv::Mat cv_image_with_alpha(im.get_height_pixels(), im.get_width_pixels(), CV_8UC4, (void *)im.get_buffer()); cv::Mat cv_image_no_alpha; cv::cvtColor(cv_image_with_alpha, cv_image_no_alpha, cv::COLOR_BGRA2BGR); return cv_image_no_alpha; } static cv::Mat depth_to_opencv(const k4a::image &im) { return cv::Mat(im.get_height_pixels(), im.get_width_pixels(), CV_16U, (void *)im.get_buffer(), static_cast(im.get_stride_bytes())); } static cv::Matx33f calibration_to_color_camera_matrix(const k4a::calibration &cal) { const k4a_calibration_intrinsic_parameters_t::_param &i = cal.color_camera_calibration.intrinsics.parameters.param; cv::Matx33f camera_matrix = cv::Matx33f::eye(); camera_matrix(0, 0) = i.fx; camera_matrix(1, 1) = i.fy; camera_matrix(0, 2) = i.cx; camera_matrix(1, 2) = i.cy; return camera_matrix; } static Transformation get_depth_to_color_transformation_from_calibration(const k4a::calibration &cal) { const k4a_calibration_extrinsics_t &ex = cal.extrinsics[K4A_CALIBRATION_TYPE_DEPTH][K4A_CALIBRATION_TYPE_COLOR]; Transformation tr; for (int i = 0; i < 3; ++i) { for (int j = 0; j < 3; ++j) { tr.R(i, j) = ex.rotation[i * 3 + j]; } } tr.t = cv::Vec3d(ex.translation[0], ex.translation[1], ex.translation[2]); return tr; } // This function constructs a calibration that operates as a transformation between the secondary device's depth camera // and the main camera's color camera. IT WILL NOT GENERALIZE TO OTHER TRANSFORMS. Under the hood, the transformation // depth_image_to_color_camera method can be thought of as converting each depth pixel to a 3d point using the // intrinsics of the depth camera, then using the calibration's extrinsics to convert between depth and color, then // using the color intrinsics to produce the point in the color camera perspective. static k4a::calibration construct_device_to_device_calibration(const k4a::calibration &main_cal, const k4a::calibration &secondary_cal, const Transformation &secondary_to_main) { k4a::calibration cal = secondary_cal; k4a_calibration_extrinsics_t &ex = cal.extrinsics[K4A_CALIBRATION_TYPE_DEPTH][K4A_CALIBRATION_TYPE_COLOR]; for (int i = 0; i < 3; ++i) { for (int j = 0; j < 3; ++j) { ex.rotation[i * 3 + j] = static_cast(secondary_to_main.R(i, j)); } } for (int i = 0; i < 3; ++i) { ex.translation[i] = static_cast(secondary_to_main.t[i]); } cal.color_camera_calibration = main_cal.color_camera_calibration; return cal; } static vector calibration_to_color_camera_dist_coeffs(const k4a::calibration &cal) { const k4a_calibration_intrinsic_parameters_t::_param &i = cal.color_camera_calibration.intrinsics.parameters.param; return { i.k1, i.k2, i.p1, i.p2, i.k3, i.k4, i.k5, i.k6 }; } bool find_chessboard_corners_helper(const cv::Mat &main_color_image, const cv::Mat &secondary_color_image, const cv::Size &chessboard_pattern, vector &main_chessboard_corners, vector &secondary_chessboard_corners) { bool found_chessboard_main = cv::findChessboardCorners(main_color_image, chessboard_pattern, main_chessboard_corners); bool found_chessboard_secondary = cv::findChessboardCorners(secondary_color_image, chessboard_pattern, secondary_chessboard_corners); // Cover the failure cases where chessboards were not found in one or both images. if (!found_chessboard_main || !found_chessboard_secondary) { if (found_chessboard_main) { cout << "Could not find the chessboard corners in the secondary image. Trying again...\n"; } // Likewise, if the chessboard was found in the secondary image, it was not found in the main image. else if (found_chessboard_secondary) { cout << "Could not find the chessboard corners in the main image. Trying again...\n"; } // The only remaining case is the corners were in neither image. else { cout << "Could not find the chessboard corners in either image. Trying again...\n"; } return false; } // Before we go on, there's a quick problem with calibration to address. Because the chessboard looks the same when // rotated 180 degrees, it is possible that the chessboard corner finder may find the correct points, but in the // wrong order. // A visual: // Image 1 Image 2 // ..................... ..................... // ..................... ..................... // .........xxxxx2...... .....xxxxx1.......... // .........xxxxxx...... .....xxxxxx.......... // .........xxxxxx...... .....xxxxxx.......... // .........1xxxxx...... .....2xxxxx.......... // ..................... ..................... // ..................... ..................... // The problem occurs when this case happens: the find_chessboard() function correctly identifies the points on the // chessboard (shown as 'x's) but the order of those points differs between images taken by the two cameras. // Specifically, the first point in the list of points found for the first image (1) is the *last* point in the list // of points found for the second image (2), though they correspond to the same physical point on the chessboard. // To avoid this problem, we can make the assumption that both of the cameras will be oriented in a similar manner // (e.g. turning one of the cameras upside down will break this assumption) and enforce that the vector between the // first and last points found in pixel space (which will be at opposite ends of the chessboard) are pointing the // same direction- so, the dot product of the two vectors is positive. cv::Vec2f main_image_corners_vec = main_chessboard_corners.back() - main_chessboard_corners.front(); cv::Vec2f secondary_image_corners_vec = secondary_chessboard_corners.back() - secondary_chessboard_corners.front(); if (main_image_corners_vec.dot(secondary_image_corners_vec) <= 0.0) { std::reverse(secondary_chessboard_corners.begin(), secondary_chessboard_corners.end()); } return true; } Transformation stereo_calibration(const k4a::calibration &main_calib, const k4a::calibration &secondary_calib, const vector> &main_chessboard_corners_list, const vector> &secondary_chessboard_corners_list, const cv::Size &image_size, const cv::Size &chessboard_pattern, float chessboard_square_length) { // We have points in each image that correspond to the corners that the findChessboardCorners function found. // However, we still need the points in 3 dimensions that these points correspond to. Because we are ultimately only // interested in find a transformation between two cameras, these points don't have to correspond to an external // "origin" point. The only important thing is that the relative distances between points are accurate. As a result, // we can simply make the first corresponding point (0, 0) and construct the remaining points based on that one. The // order of points inserted into the vector here matches the ordering of findChessboardCorners. The units of these // points are in millimeters, mostly because the depth provided by the depth cameras is also provided in // millimeters, which makes for easy comparison. vector chessboard_corners_world; for (int h = 0; h < chessboard_pattern.height; ++h) { for (int w = 0; w < chessboard_pattern.width; ++w) { chessboard_corners_world.emplace_back( cv::Point3f{ w * chessboard_square_length, h * chessboard_square_length, 0.0 }); } } // Calibrating the cameras requires a lot of data. OpenCV's stereoCalibrate function requires: // - a list of points in real 3d space that will be used to calibrate* // - a corresponding list of pixel coordinates as seen by the first camera* // - a corresponding list of pixel coordinates as seen by the second camera* // - the camera matrix of the first camera // - the distortion coefficients of the first camera // - the camera matrix of the second camera // - the distortion coefficients of the second camera // - the size (in pixels) of the images // - R: stereoCalibrate stores the rotation matrix from the first camera to the second here // - t: stereoCalibrate stores the translation vector from the first camera to the second here // - E: stereoCalibrate stores the essential matrix here (we don't use this) // - F: stereoCalibrate stores the fundamental matrix here (we don't use this) // // * note: OpenCV's stereoCalibrate actually requires as input an array of arrays of points for these arguments, // allowing a caller to provide multiple frames from the same camera with corresponding points. For example, if // extremely high precision was required, many images could be taken with each camera, and findChessboardCorners // applied to each of those images, and OpenCV can jointly solve for all of the pairs of corresponding images. // However, to keep things simple, we use only one image from each device to calibrate. This is also why each of // the vectors of corners is placed into another vector. // // A function in OpenCV's calibration function also requires that these points be F32 types, so we use those. // However, OpenCV still provides doubles as output, strangely enough. vector> chessboard_corners_world_nested_for_cv(main_chessboard_corners_list.size(), chessboard_corners_world); cv::Matx33f main_camera_matrix = calibration_to_color_camera_matrix(main_calib); cv::Matx33f secondary_camera_matrix = calibration_to_color_camera_matrix(secondary_calib); vector main_dist_coeff = calibration_to_color_camera_dist_coeffs(main_calib); vector secondary_dist_coeff = calibration_to_color_camera_dist_coeffs(secondary_calib); // Finally, we'll actually calibrate the cameras. // Pass secondary first, then main, because we want a transform from secondary to main. Transformation tr; double error = cv::stereoCalibrate(chessboard_corners_world_nested_for_cv, secondary_chessboard_corners_list, main_chessboard_corners_list, secondary_camera_matrix, secondary_dist_coeff, main_camera_matrix, main_dist_coeff, image_size, tr.R, // output tr.t, // output cv::noArray(), cv::noArray(), cv::CALIB_FIX_INTRINSIC | cv::CALIB_RATIONAL_MODEL | cv::CALIB_CB_FAST_CHECK); cout << "Finished calibrating!\n"; cout << "Got error of " << error << "\n"; return tr; } // The following functions provide the configurations that should be used for each camera. // NOTE: For best results both cameras should have the same configuration (framerate, resolution, color and depth // modes). Additionally the both master and subordinate should have the same exposure and power line settings. Exposure // settings can be different but the subordinate must have a longer exposure from master. To synchronize a master and // subordinate with different exposures the user should set `subordinate_delay_off_master_usec = ((subordinate exposure // time) - (master exposure time))/2`. // static k4a_device_configuration_t get_default_config() { k4a_device_configuration_t camera_config = K4A_DEVICE_CONFIG_INIT_DISABLE_ALL; camera_config.color_format = K4A_IMAGE_FORMAT_COLOR_BGRA32; camera_config.color_resolution = K4A_COLOR_RESOLUTION_720P; camera_config.depth_mode = K4A_DEPTH_MODE_WFOV_UNBINNED; // No need for depth during calibration camera_config.camera_fps = K4A_FRAMES_PER_SECOND_15; // Don't use all USB bandwidth camera_config.subordinate_delay_off_master_usec = 0; // Must be zero for master camera_config.synchronized_images_only = true; return camera_config; } // Master customizable settings static k4a_device_configuration_t get_master_config() { k4a_device_configuration_t camera_config = get_default_config(); camera_config.wired_sync_mode = K4A_WIRED_SYNC_MODE_MASTER; // Two depth images should be seperated by MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC to ensure the depth imaging // sensor doesn't interfere with the other. To accomplish this the master depth image captures // (MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC / 2) before the color image, and the subordinate camera captures its // depth image (MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC / 2) after the color image. This gives us two depth // images centered around the color image as closely as possible. camera_config.depth_delay_off_color_usec = -static_cast(MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC / 2); camera_config.synchronized_images_only = true; return camera_config; } // Subordinate customizable settings static k4a_device_configuration_t get_subordinate_config() { k4a_device_configuration_t camera_config = get_default_config(); camera_config.wired_sync_mode = K4A_WIRED_SYNC_MODE_SUBORDINATE; // Two depth images should be seperated by MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC to ensure the depth imaging // sensor doesn't interfere with the other. To accomplish this the master depth image captures // (MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC / 2) before the color image, and the subordinate camera captures its // depth image (MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC / 2) after the color image. This gives us two depth // images centered around the color image as closely as possible. camera_config.depth_delay_off_color_usec = MIN_TIME_BETWEEN_DEPTH_CAMERA_PICTURES_USEC / 2; return camera_config; } static Transformation calibrate_devices(MultiDeviceCapturer &capturer, const k4a_device_configuration_t &main_config, const k4a_device_configuration_t &secondary_config, const cv::Size &chessboard_pattern, float chessboard_square_length, double calibration_timeout) { k4a::calibration main_calibration = capturer.get_master_device().get_calibration(main_config.depth_mode, main_config.color_resolution); k4a::calibration secondary_calibration = capturer.get_subordinate_device_by_index(0).get_calibration(secondary_config.depth_mode, secondary_config.color_resolution); vector> main_chessboard_corners_list; vector> secondary_chessboard_corners_list; std::chrono::time_point start_time = std::chrono::system_clock::now(); while (std::chrono::duration(std::chrono::system_clock::now() - start_time).count() < calibration_timeout) { vector captures = capturer.get_synchronized_captures(secondary_config); k4a::capture &main_capture = captures[0]; k4a::capture &secondary_capture = captures[1]; // get_color_image is guaranteed to be non-null because we use get_synchronized_captures for color // (get_synchronized_captures also offers a flag to use depth for the secondary camera instead of color). k4a::image main_color_image = main_capture.get_color_image(); k4a::image secondary_color_image = secondary_capture.get_color_image(); cv::Mat cv_main_color_image = color_to_opencv(main_color_image); cv::Mat cv_secondary_color_image = color_to_opencv(secondary_color_image); vector main_chessboard_corners; vector secondary_chessboard_corners; bool got_corners = find_chessboard_corners_helper(cv_main_color_image, cv_secondary_color_image, chessboard_pattern, main_chessboard_corners, secondary_chessboard_corners); if (got_corners) { main_chessboard_corners_list.emplace_back(main_chessboard_corners); secondary_chessboard_corners_list.emplace_back(secondary_chessboard_corners); cv::drawChessboardCorners(cv_main_color_image, chessboard_pattern, main_chessboard_corners, true); cv::drawChessboardCorners(cv_secondary_color_image, chessboard_pattern, secondary_chessboard_corners, true); } cv::imshow("Chessboard view from main camera", cv_main_color_image); cv::waitKey(1); cv::imshow("Chessboard view from secondary camera", cv_secondary_color_image); cv::waitKey(1); // Get 20 frames before doing calibration. if (main_chessboard_corners_list.size() >= 20) { cout << "Calculating calibration..." << endl; return stereo_calibration(main_calibration, secondary_calibration, main_chessboard_corners_list, secondary_chessboard_corners_list, cv_main_color_image.size(), chessboard_pattern, chessboard_square_length); } } std::cerr << "Calibration timed out !\n "; exit(1); } static k4a::image create_depth_image_like(const k4a::image &im) { return k4a::image::create(K4A_IMAGE_FORMAT_DEPTH16, im.get_width_pixels(), im.get_height_pixels(), im.get_width_pixels() * static_cast(sizeof(uint16_t))); } ```

rabbitdaxi commented 4 years ago

@guanming001 Thank you for the feedback and sharing the sample code.

I had a quick look of the green screen sample code and also the sample code that how you leverage the green screen sample to calibrate your two camera and transform the point cloud into the same camera space (in the sample code you shared here, they all transformed to the "main color" camera). I can compile your code without changing anything, it seems can align the two points clouds from the two camera in the same coordinate space pretty well. See the image below, apologize of the simplicity in the image, do not want to have much privacy data in the image. (you are rendering the blue and red to represent the point cloud from those two devices).

And yes, you are doing the right thing. To overlay point clouds from two or more cameras, which is quite a common case of using multiple Azure Kinect from the community here is an example. The sample code you shared looks reasonable. Essentially, the steps to do this task are:

you need to calibrate the extrinsics (rotation/translation) between the two/more kinect devices. Depends on what is the hardware setup scenario (devices count and location/pointing direction, distance between object and cameras etc.), you may need to choose the best fit calibration method and target calibration pattern to ensure the subpixel reprojection accuracy. It looks like the green screen is using a simple stereo calibration from opencv as an example, there are many other ways such as multiboard, moving board etc. or many other targets other than chessboard can help you calibrate them based on your needs.
you need to generate point clouds from the two depth camera which the SDK supports this
you need to define the coordinate space that you want to render them, for example, you can simply define one of the cameras on the Azure Kinect as the coordinate space center (given you have the extrinsics can transform point cloud among all cameras by using the calibration comes from the device itself as well as the extrinsics you calibrated between device to device). then you can mimic how the green screen example to create the new calibration object that from the source camera to the target camera (even they are cross devices) and use the SDK transformation function to transform them.

The key of how you can get them overlay accurately will be the calibration stage.

Now, you mentioned you cannot get a good overlay of the two point clouds. It might be good you can share some more information:

can you describe how much the alignment error you think is?
what type of target object is for these point cloud (e.g. if you are having some material tends to bring depth accuracy issue, such as subsurface scattering on human skin, you may have slightly alignment error due to the depth accuracy on such material)
is the alignment issue happening for moving object or stationery object as well? (if the point cloud aligns for stationary object, but more error on moving object, this may indicate some time synchronization issue)

guanming001 commented 4 years ago

Hi @rabbitdaxi Thank you for your time and detailed feedback!

Below is the overview of my setup with two Azure Kinect and a (10 x 7) calibration board (square size of 25 mm) placed around 1.0m from both cameras: overview of setup

I ran the above modified green_screen code using the following settings num-cameras = 2 board-height = 7 board-width = 10 board-square-length = 25 depth-threshold-mm = 1500 color-exposure-time-usec = 10000 powerline-frequency-mode = 1 calibration-timeout-sec = 60

>main.exe 2 7 10 25 1500 10000 1 60

Below is some output from the calibration result:

Chessboard height: 7. Chessboard width: 10. Chessboard square length: 25 Depth threshold: : 1500. Color exposure time: 10000. Powerline frequency mode: 1
...
Calculating calibration... 
Finished calibrating! 
Got error of 0.112819

Chessboard view from main camera: main

Chessboard view from secondary camera:

Below is the result that I obtained, the object considered is a stationary 3D printed Stanford Bunny and the red point cloud (secondary camera) is offset to the left of the blue point cloud (main camera) Front view: front

Top view: top

I have also recorded two .mkv files ( output-main.mkv and output-secondary.mkv) obtained using k4arecorder.exe for your reference. Below are the commands that I used to log the 2 sec recording on a single computer: Start recorder on subordinate:

k4arecorder.exe --device 0 -l 2 -c 720p -d WFOV_UNBINNED -r 5 --imu OFF --external-sync subordinate --sync-delay 160 -e -8 output-secondary.mkv

Start master device:

k4arecorder.exe --device 1 -l 2 -c 720p -d WFOV_UNBINNED -r 5 --imu OFF --external-sync master -e -8 output-main.mkv

mbleyer commented 4 years ago

How many images of the calibration board did you take per camera? Did the images show the board from different perspectives?

guanming001 commented 4 years ago

Hi @mbleyer i have only taken one image of calibration per camera

mbleyer commented 4 years ago

This might be the problem. Please take multiple images (>30) of the calibration board at different distances and rotations.

guanming001 commented 4 years ago

Hi @mbleyer thank you for your suggestion, I have tried to take 30 images of the calibration board at different distances and rotations by modifying the capture_devices() function to wait for user input before capturing the next frame:

static Transformation calibrate_devices(MultiDeviceCapturer &capturer,
                                        const k4a_device_configuration_t &main_config,
                                        const k4a_device_configuration_t &secondary_config,
                                        const cv::Size &chessboard_pattern,
                                        float chessboard_square_length,
                                        double calibration_timeout)
{
    k4a::calibration main_calibration = capturer.get_master_device().get_calibration(main_config.depth_mode,
                                                                                     main_config.color_resolution);

    k4a::calibration secondary_calibration =
        capturer.get_subordinate_device_by_index(0).get_calibration(secondary_config.depth_mode,
                                                                    secondary_config.color_resolution);
    vector<vector<cv::Point2f>> main_chessboard_corners_list;
    vector<vector<cv::Point2f>> secondary_chessboard_corners_list;
    std::chrono::time_point<std::chrono::system_clock> start_time = std::chrono::system_clock::now();
    int counter = 0;
    //while (std::chrono::duration<double>(std::chrono::system_clock::now() - start_time).count() < calibration_timeout)
    while (true)
    {
        vector<k4a::capture> captures = capturer.get_synchronized_captures(secondary_config);
        k4a::capture &main_capture = captures[0];
        k4a::capture &secondary_capture = captures[1];
        // get_color_image is guaranteed to be non-null because we use get_synchronized_captures for color
        // (get_synchronized_captures also offers a flag to use depth for the secondary camera instead of color).
        k4a::image main_color_image = main_capture.get_color_image();
        k4a::image secondary_color_image = secondary_capture.get_color_image();
        cv::Mat cv_main_color_image = color_to_opencv(main_color_image);
        cv::Mat cv_secondary_color_image = color_to_opencv(secondary_color_image);

        vector<cv::Point2f> main_chessboard_corners;
        vector<cv::Point2f> secondary_chessboard_corners;
        bool got_corners = find_chessboard_corners_helper(cv_main_color_image,
                                                          cv_secondary_color_image,
                                                          chessboard_pattern,
                                                          main_chessboard_corners,
                                                          secondary_chessboard_corners);
        if (got_corners)
        {
            main_chessboard_corners_list.emplace_back(main_chessboard_corners);
            secondary_chessboard_corners_list.emplace_back(secondary_chessboard_corners);
            cv::drawChessboardCorners(cv_main_color_image, chessboard_pattern, main_chessboard_corners, true);
            cv::drawChessboardCorners(cv_secondary_color_image, chessboard_pattern, secondary_chessboard_corners, true);

            std::stringstream ss;
            ss << std::setw(6) << std::setfill('0') << counter++;
            cv::imwrite(ss.str() + "_main.png", cv_main_color_image);
            cv::imwrite(ss.str() + "_seco.png", cv_secondary_color_image);
            cout << "Saved " << ss.str() << endl;
        }

        cv::imshow("Chessboard view from main camera", cv_main_color_image);
        //cv::waitKey(1);
        cv::imshow("Chessboard view from secondary camera", cv_secondary_color_image);
        cv::waitKey(0); // Wait for user input before capturing the next frame

        // Get 30 frames before doing calibration.
        if (main_chessboard_corners_list.size() >= 30)
        {
            cout << "Calculating calibration..." << endl;
            return stereo_calibration(main_calibration,
                                      secondary_calibration,
                                      main_chessboard_corners_list,
                                      secondary_chessboard_corners_list,
                                      cv_main_color_image.size(),
                                      chessboard_pattern,
                                      chessboard_square_length);
        }
    }
    std::cerr << "Calibration timed out !\n ";
    exit(1);
}

Below are the images of the captured calibration board:

![combine](https://user-images.githubusercontent.com/38412160/66095603-ae2f4a00-e5ca-11e9-8fbc-2e5c7b22dc5d.png)

But the result I got is still not aligned: Front view: Capture

Side view: Capture1

However, when I tried to place both Kinect side-by-side and calibrate with one image per camera, I can get a good result: Setup: 20191003_105228

Front view: front

Side view: side

valvador92 commented 4 years ago

Hi @guanming001,

I have the same issue than you. Did you find a way to find properly the transformation between the two cameras? Could the problem come from the intrinsic et distortion coefficients of the Kinect cameras?

Thanks in advance!

guanming001 commented 4 years ago

Hi @valvador92 glad to know someone is experiencing the same issue as me!

I still could not solve the problem with Azure Kinect, but I have tried to use a similar method (stereo calibrate) for two Intel Realsense D415 cameras (without distortion) and the point clouds could overlay correctly.

forestsen commented 4 years ago

Hi @guanming001, Would you mind trying my code? https://github.com/forestsen/KinectAzureDKProgramming You can find the Two Kinects calibration using the aruco library. And this project. https://github.com/forestsen/K4aGrabber You can find the real-time point cloud stitching for two kinects.

guanming001 commented 4 years ago

Hi @forestsen

Thanks for your code, I am able to build your example on Aruco_TwoKinects_Calibration_Extrinsics. With the printed arucopattern.png, I run your code and I am able to get the frame_marker_sub.csv and frame_sub_master.csv files.

Then I run the PCL_Grabber_TwoKinects code to get the point cloud stitching for two kinects, but below is the result I get: aruco

May I check if you are able to successfully align the point clouds and did I miss out any steps?

Note: I tested on azure-kinect-sensor-sdk version 1.2.0 instead of 1.1.1 (by making some slight modification to the CMakeList.txt and azure-kinect-sensor-sdk-config.cmake)

forestsen commented 4 years ago

Hi, @guanming001 Thank you for your advice. I have change the k4a version of my code. You can pull the new code of https://github.com/forestsen/KinectAzureDKProgramming and https://github.com/forestsen/K4aGrabber This is my result. Point Cloud Viewer 2019_9_1 21_44_24_20190902174707 I use the parameters such as "frame_sub_master.csv frame_sub_marker.csv 0.25 1.0 0.25 1.0" in the PCL_Grabber_TwoKinects project.

valvador92 commented 4 years ago

@guanming001 , which modification have you done to the CMakeList.txt and the azure-kinect-sensor-sdk-config.cmake?

@mbleyer Hi, could the problem comes from the intrinsic parameters of the camera and the distortion coefficients? How good is the calibration done by Microsoft?

forestsen commented 4 years ago

Hi, @valvador92 You can check out the new code of https://github.com/forestsen/KinectAzureDKProgramming and https://github.com/forestsen/K4aGrabber I have changed the k4a version of my code.

valvador92 commented 4 years ago

@forestsen Thank you for your message.

I am trying to build https://github.com/forestsen/K4aGrabber.

But I got this error. Do you know why?

Is your code running on Ubuntu 18.04?

valvador92 commented 4 years ago

@forestsen Thank you for your message.

I am trying to build https://github.com/forestsen/K4aGrabber.

But I got this error. Do you know why?

Is your code running on Ubuntu 18.04?

I commented the line template. It runs now.

What are these parameters: master_znear master_zfar sub_znear sub_zfar?

valvador92 commented 4 years ago

@forestsen I tried and I have the same problem than @guanming001

I got this kind of image when I run Aruco_TwoKinects_Calibration_Extrinsics:

forestsen commented 4 years ago

Hi @valvador92 @guanming001 I refined the calibration result based on the Open3D-0.8.0. Using the Open3D ColoredICP, I refined the extrinsic matrix between two azure kinects. This is the result of Aruco_TwoKinects_Calibration_Extrinsics. QQ截图20191010174639 This is the result which only based on the calibration result (extrinsic matrix between two kinects) from Aruco_TwoKinects_Calibration_Extrinsics.

This is the new result from PCL_Grabber_TwoKinects.

And this is another comparison result.

Lastly, you just use the "frame_sub_master.csv" parameter in the new version of PCL_Grabber_TwoKinects.

mbleyer commented 4 years ago

@valvador92: The accuracy of the intrinsic calibration is expected to be high. We are using the intrinsic parameters to convert from radial depth to Z values. If there were inaccuracies in the intrinsic calibration, flat surfaces would not appear flat in the 3d reconstruction.

My impression is that the problem is the extrinsic calibration using OpenCV. I am sure that there is plenty of information about camera calibration with OpenCV on the web. Oftentimes, the calibration does not give precise results, because the user took too few or too similar images of the calibration board, the calibration board was not flat enough or the size of the squares was not measured accurately. I would also make sure that you move the calibration board as close as possible to the cameras in some recordings and rotate the calibration board around all three axes. I would also check for situations where the order of corners is flipped across left and right images, which sometimes happened, at least in older versions of OpenCV.

To verify the calibration you might want look at the translation vector and check if its length matches the measured distance between your cameras.

valvador92 commented 4 years ago

@mbleyer

Thank you for your comment. I checked the translation vector. I took 50 images. The values seem correct when the 2 cameras are close from each other. Bu when I increased the distance, the values are too small. I tried stereoCalibrate from OpenCV with a chessboard. Do you know if it works better with ArucoBoard? Otherwise do you know a pipeline to find this transformation matrix?

Thank you in advance

guanming001 commented 4 years ago

Hi @mbleyer could the issue of systematic nonlinear depth distortion (that is inherent in time-of-flight depth cameras) also contribute to the misalignment in point clouds?

i.e. even though the extrinsic calibration between the cameras is performed properly, the point cloud may still not align correctly as the extrinsic calibration using checkboard is only between the color cameras but did not consider the distortion in point cloud.

henryo12 commented 4 years ago

Hi @valvador92 and @guanming001, I am running your modified green_screen code, but it seems like I am able to extract much more data from the primary camera compared to the secondary camera (i.e. my generated point cloud is 80% points from the primary camera). Did either of you experience this? Is this likely a limitation caused by my computer specs (i7-7820x and GTX 1080) or some other issue?

guanming001 commented 4 years ago

Hi @henryo12 the position of the cameras and the scene that they are looking at will affect the number of generated point cloud data ... do you have any images of the camera position and the captured scene?

henryo12 commented 4 years ago

Thanks @guanming001 , I think I realized part of my concern: does the viewer only display points with positive depth in the primary camera's coordinates? Regardless, you can see in my images that the blue point cloud data is denser than that from the other two cameras. The performance is also similar when using your unmodified code with only two cameras.

I saved a screenshot from each camera's perspective. I circled myself where I am standing in the middle of the cameras. There are obvious alignment issues which I have yet to address.

jasjuang commented 4 years ago

Hi @valvador92 glad to know someone is experiencing the same issue as me!

I still could not solve the problem with Azure Kinect, but I have tried to use a similar method (stereo calibrate) for two Intel Realsense D415 cameras (without distortion) and the point clouds could overlay correctly.

@guanming001 Crazy that we had the same problem. I can align the point clouds successfully from the realsense camera but for some reason, not the Azure Kinect. It seems to me the only difference between them is one has distortion and one doesn't, so that leads me to think there's something related to the distortion that's causing the misalignment.

While finding the transformation between 2 kinects, I've tried

undistorting (cv::undistort) the checkerboard images first and then stereo calibrate (cv::stereoCalibrate) with zero distortion
direct stereo calibrate with captured checkerboard images with given distortion parameters

both yield similar transformation, so this leads me to believe the distortion parameters provided by the kinect should be right.

Next, I undistort the color image and depth image for both kinects, convert the depth to pointcloud, with the given intrinsics and then transform them based on k4a_calibration_extrinsics_t, and visually it looks like the point cloud aligns correctly with the color image. After doing the same thing for the 2nd kinect, when I apply the transformation from the stereo calibration to the point cloud from the secondary kinect, I get similar results as @guanming001. Very puzzling.

One potential problem I could see is when I undistort the depth image, does the depth value has to be adjusted somehow to reflect the change? Meaning can I simply treat the depth image as a 32-bit float image and just call cv::undistort on it and use the undistorted depth value directly? I think the answer should be yes, can @rabbitdaxi confirm this?

amonnphillip commented 4 years ago

Hi @jasjuang, I had a similar problem with 3 Kinect DK devices I purchased. I was unable to accurately calculate the pose of the other 2 cameras and thus always had problem generating an accurate point cloud of my scene. I had to re-calibrate the color camera intrinsics and ony after this procedure was I able to generate accurate camera poses.

All 3 devices had, lets say, 'less than optimal' intrinsics values out of the box.

I used a custom made, high accurate Charuco board, with OpenCV 3+. I used 60 or so images to generate my own intrinsic values.

Please try calibrating your cameras and share your results with the community. If there is indeed an issue with factory calibration we should know.. I wasted quite a bit of development time trying to figure out what the problem was and did not consider factory calibration issues until I had exhausted all the other possibilities.

jasjuang commented 4 years ago

@amonnphillip thanks for sharing your experience and I will give that a try. Did you only re-calibrate the intrinsics of the color camera? Did you redo the intrinsics for the depth camera and the extrinsics between the color and depth camera?

amonnphillip commented 4 years ago

I only calibrated the color camera and that gave me enough pose estimation accuracy for my use case (for now at least, we are still in early development). But yes, I do wonder about the accuracy of the depth intrinsics also. I guess you could calibrate the depth camera using the IR stream??? Maybe someone from Microsoft can advise..

jasjuang commented 4 years ago

@amonnphillip I re-calibrate the intrinsics of the color camera but not the depth camera, and to my great surprise, it actually produces better alignment results than the one Azure Kinect provides, but there's still a little bit misalignment in there that's not good enough for my purpose. I think this is most likely due to the intrinsics of the depth camera still being from the old one, and the extrinsics between the color and depth camera still being the old one. For the new intrinsic parameters I calculated for the color camera, the focal length and distortion parameters are wildly different, and only the principle points are somewhat similar.

I am trying to figure out how to capture the checkerboard images with the IR stream so that my checkerboard corner detection code will work like usual. The resolution for the IR stream is extremely low and some type of IR light has to be used to light up the image.

jasjuang commented 4 years ago

After a ton of experiments, I believe this is not a calibration problem. It's a depth accuracy problem instead, and I believe the accuracy of the depth is highly affected by the material. If you google "time of flight depth material" a lot of papers will pop up talking about this.

I was able to write the code to add depth information of the checkerboard into calibration as I mentioned in #937, and the calibration results were perfect because when I reconstruct our calibration tower that was being used for calibration from different views, the alignment was on point. However, when I try to reconstruct a human with the exact same calibration parameters, I see the same level of misalignment as shown in this thread. The only explanation here can only be that the depth accuracy varies based on different materials.

This has been extremely frustrating. Can anyone from Microsoft grab two Azure Kinects, and try to align them yourself, and tell us whether you are able to successfully align the point clouds?

JulesThuillier commented 4 years ago

Hello everyone ! Just out of curiosity, you are all using the k4a_calibration_3d_to_3d() function to convert your depth data to the color camera 3D space right ? Because you are calibrating with the color camera but then try to match cloud points data as if they were taken by the same sensor. If you are not doing that then I think this is your problem. Kinect Calibration Doc

EDIT : After a bit of reading of the green screen example and your code, I might have something you could try out... It's a guess, but should be easy to try !

Your code : // Transform Depth Image to Point Cloud main_xyz_image = main_depth_to_main_color.depth_image_to_point_cloud( main_depth_in_main_color, K4A_CALIBRATION_TYPE_COLOR ); secondary_xyz_image = secondary_depth_to_main_color.depth_image_to_point_cloud(secondary_depth_in_main_color, K4A_CALIBRATION_TYPE_COLOR); // Get cv::Mat from k4a::image main_xyz = k4a::get_mat(main_xyz_image); secondary_xyz = k4a::get_mat(secondary_xyz_image);

You call depth_image_to_point_cloud on a depth image that as just been converted to the color camera before with this function depth_image_to_color_camera

Directly call the function on the original depth image : main_depth_image.depth_image_to_point_cloud(main_depth_image, K4A_CALIBRATION_TYPE_DEPTH)

Because I am not certain of the K4A_CALIBRATION_TYPE_COLOR flag to choose the right conversion. Let us know if that works !

neilDGD commented 4 years ago

Hi,

I have been working on stereo calibration of more than one k4a, and have a problem tracking down the cause of a slight misalignment problem when doing a stereo calibration similar to the Greenscreen example.

The typical output from a stereo calibration using OpenCV + k4a sdk is like this:

Ignore the bad depth reconstruction, my covid capture environment is less than ideal ;)

As you can see, there is a slight translation offset and / or rotation within the subordinate camera pose matrix and translation vector and I have tried everything I can to try and narrow down if this is something in our codebase or whether it is something else (sensor intrinsic calibration for example).

The method is similar to the Greenscreen example in that I continuously receive valid checkerboard captures up to a maximum calibration pair count (I am testing with 50 now). With varied orientation and positioning of the board across the shared frustum.

I am currently projecting/mapping the depth buffer to colour space and doing stereo calibration on colour pairs to get the relative subordinate camera pose. Essentially from what I understand, as long as the depth is mapped into colour space, all that is needed when a calibration pose is composed is to project the pointcloud / do meshing as calibration is performed in colour space and depth is already in colour space.

I have also verified many times whether the pointcloud projection is correct by comparing our GPU projection method against k4a sdk k4a_calibration_2d_to_3d projection offset table method then sampling this instead of calculating the projection on GPU - both results in exactly the same pointcloud.

I have confirmed that undistortion is correct in images and the undistorted depth and colour match spatially.

I have also tried by not projecting the depth onto colour space in the first place, and applying the extrinsic matrix + offset back to depth space before projecting to the point cloud. Results are very similar to depth->colour space based pointcloud, there are slight offsets 1-2 inches and / or slight rotation (mostly heading).

I have also tried various other things, exhausting any ideas for now, e.g. undistorting calibration images before cv::stereoCalibrate etc.

I wonder if anyone could suggest any other factors that could affect getting a perfect calibration using OpenCV StereoCalibrate. I also wonder if there was actually any truth that maybe the k4a sensors I have, have less than optimal intrinsic calibration data? Is there any cases where a mono sensor lens calibration has yielded much better results? Is there any truth to this rumour? :)

Anyway, thanks for reading this lengthy post, any suggestions would be gratefully received.

Many thanks,

Neil

danilogr commented 4 years ago

hey @neilDGD, How are you rendering your point-cloud?

Your workflow makes sense to me:

stereo calibration between kinect cameras using the color camera
projecting depth images to the color camera space
creating point cloud with something like k4a_calibration_2d_to_3d

Building on what @JulesThuillier said, I would say that you might have a problem in the third step. I have also implemented a way of projecting through the GPU, but in my case, I have to use a look-up table to properly project the depth component (I basically used the lookup table generated here: https://github.com/microsoft/Azure-Kinect-Sensor-SDK/blob/develop/examples/fastpointcloud/main.cpp). The look-up table gives you 3D location (x, y, z) of the point-cloud point given its x and y in the color image and its z/depth in the depth image.

If your shader is treating the depth texture as coming from a pin-hole camera, the point cloud will not be correct. In other words, using the camera focal length, principal point, and resolution to calculate each point's 3D location will result in a point cloud that is slightly incorrect. k4a images have tangencial and radial distortion. You can account for those manually or using a look up table.

neilDGD commented 4 years ago

hey @danilogr, thanks for the quick and interesting reply!

So, as I said, I have tried with a normal gpu projection technique e.g using trig. and compared against using the same XY lookup table method as you pointed out e.g. generated by:

void Kinect4Manager::Kinect4Instance::createXYProjectionTable()
{
    k4a_image_create(   K4A_IMAGE_FORMAT_CUSTOM,
                        m_depthWidth,
                        m_depthHeight,
                        m_depthWidth * (int)sizeof(k4a_float2_t),
                        &m_xyProjectionTable);

    k4a_float2_t *table_data = (k4a_float2_t *)(void *)k4a_image_get_buffer(m_xyProjectionTable);

    for (int y = 0, idx = 0; y < m_depthHeight; y++)
    {
        k4a_float2_t p;

        p.xy.y = (float)y;
        for (int x = 0; x < m_depthWidth; x++, idx++)
        {
            p.xy.x = (float)x;

            int valid;
            k4a_float3_t ray;
            k4a_calibration_2d_to_3d( &m_sensorCalibration, &p, 1.f, K4A_CALIBRATION_TYPE_COLOR, K4A_CALIBRATION_TYPE_COLOR, &ray, &valid);

            if (valid)
            {
                table_data[idx].xy.x = ray.xyz.x;
                table_data[idx].xy.y = ray.xyz.y;
            }
            else
            {
                table_data[idx].xy.x = nanf("");
                table_data[idx].xy.y = nanf("");
            }
        }
    }
}

I just bind this as a texture and sample that instead, on a switch, or, just use trig as a pinhole would, here is the snippet from my shader:

if (K4ProjectionLUTEnabled)
    {
        // k4a pre-calculated projection LUT
        float2 xyProj = XyProjectionTable.SampleLevel(PointClampSampler, uv, 0);
        pos.z = depthValue * 0.001f;
        pos.xy = xyProj * pos.z;
        pos.z -= PivotDistance; // set to 0
        pos.x = -pos.x;
        pos.y = -pos.y;
    }
    else
    {
        pos = float3(uv.x * 2.0 - 1.0f, (1 - uv.y) * 2.0f - 1.0f, depthValue * 0.001f);
        pos.xy *= DepthTanHalfFov * depthValue * 0.001f; // length mm -> m
        pos.z -= PivotDistance; // set to 0
        pos.x = -pos.x;
    }

both these methods give the same resulting point cloud.

Interesting you say about pinhole model would not work, distortion should not affect the projected pointcloud as when I call k4a_transformation_depth_image_to_color_camera() it undistorts the image aswell as maps into colour space (i can see this visibly and depth + colour align perfectly). Only other way I think pinhole wouldn't work is if depth is non linear?

I'm thinking the projection isn't at fault as using the generated lookup table does not change the pointcloud, it is identical.

These are images of the colour remapped depth and colour frame:

As you can see, no visible distortion like was there if I did not remap it.

One thing I just noticed is that I am doing this lookup table projection on K4A_CALIBRATION_TYPE_COLOR, I wonder what result I will get by using K4A_CALIBRATION_TYPE_DEPTH instead, i'll give that a try in the morning.

Many thanks!

Neil

danilogr commented 4 years ago

I got slightly different results between the trig approach and the look-up table approach, so I am assuming that depth is nonlinear.

I think that the lookup table projection on K4A_CALIBRATION_TYPE_COLOR is correct as you are projecting your depth image to the color camera space. Let me know if that is not the case.

My GPU implementation is similar to yours with one exception:

        // k4a pre-calculated projection LUT
        float2 xyProj = XyProjectionTable.SampleLevel(PointClampSampler, uv, 0);
        pos.z = depthValue * 0.001f;
        pos.xy = xyProj * pos.z;
        pos.z -= PivotDistance; // <----------- This doesn't seem right (unless PivotDistance  == depthValue * 0.001f)
        pos.x = -pos.x;
        pos.y = -pos.y;

I can a very suttle difference between my look-up table version and the non-look-up table one, almost as if the look-up table shrinks my pointcloud by a bit...

neilDGD commented 4 years ago

Hi again @danilogr,

Just to confirm, yes I project the depth image into colour space so what you are saying is true for the lookup source and destination to be K4A_CALIBRATION_TYPE_COLOR, i guess. I wonder if there could be some sort of difference when projecting the pointcloud from colour space instead of depth space though, curious if you do the remapping to colour space also? I wonder if there is some loss of data when doing this mapping maybe? Is producing the lookup table from colour space as good as depth space for example and is depth non-linear after the remapping and is this corrected in the LUT?

We work in metres, The pivot is also set to 0 for testing.

I cannot see any difference between the pointclouds with the lookup, not that I can visibly see anyway- still getting offsets in X,Y although Z seems spot on.

Thanks,

Neil

ndaley7 commented 3 years ago

Hello.... We're also interested in trying out the point cloud alignment(2 Camera setup) based on the Greenscreen example and are experiencing the same issues with the rotational/translational misalignment.... was a solution ever reached for how to overlay the clouds properly? 2021-06-20 14_34_59-MeshLab 2016 12 - Project_1

ndaley7 commented 3 years ago

This was able to fix the rotational Alignment Issue using this correction of the GreenScreen Example code: https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/1420

The Translational offset at least seems to be consistent based on the setup, so most likely we will compensate with a positional offset on all points from the secondary camera until a more comprehensive solution can be found.

Koring12 commented 2 years ago

Dear Community I am having the same issue here. I couldn't solve the problem with the solution @ndaley7 suggested. Any more suggestions to solve this issue?

I am also trying to do the conversion of the color image data of the secondary azure Kinect to the secondary_depth_in_main_color data. can anyone show me an example of this data conversion as well?

Thanks in advance!

abdu307 commented 1 year ago

I think this calibration issue is caused by the physical offset between the RGB and the Depth sensors on azure Kinect, you can correct it by translating the RGB camera calibration to match the Depth sensor location on azure Kinect!

lun73 commented 1 year ago

嗨，@valvador92很高興知道有人遇到與我相同的問題！

我仍然無法解決 Azure Kinect 的問題，但我已嘗試對兩個英特爾實感 D415 攝像頭（無失真）使用類似的方法（立體校準），並且點雲可以正確疊加。

Hello! I'm using two D415. I'm working on a similar problem, can I refer to the program you code?

microsoft / Azure-Kinect-Sensor-SDK

[Calibration] Overlay point cloud from two cameras #803