Multiple camera open sourcing

WuZihao12 commented 1 year ago

I want to work on an asynchronous multi-camera and single-IMU system, do you have any suggestions? Or will you open source the code for this paper mc-vio ?

goldbattle commented 1 year ago

This repository supports monocular, binocular, and stereo synchronized cameras. Currently no plans to open source MIMC-VINS at this time. You could implement it yourself on top of OpenVINS, but that is something you would need to take on yourself.

Galto2000 commented 1 year ago

Greetings,

I am trying to interpret the information in this thread correctly, as I am interested in using three cameras; in my case the cameras are all time synchronized, but don't have any overlapping views.

I have been dabbling with open-vins in the ROS2 environment and I have been simulating a drone with three cameras in Gazebo, and I noticed that in function call https://github.com/rpng/open_vins/blob/58afb2daf47498911100edc7452ccd7e42a2db50/ov_msckf/src/ros/ROS2Visualizer.cpp#L160 when the number of cameras is larger than 2, then they appear to be "enumerated" as single monocular cameras

https://github.com/rpng/open_vins/blob/58afb2daf47498911100edc7452ccd7e42a2db50/ov_msckf/src/ros/ROS2Visualizer.cpp#L201

This works well for a single camera, but when I apply three cameras, I get very poor VIO performance.

I noticed that when I apply two cameras, even-though they don't have overlapping views (and I set use_stereo to false), it also has good VIO performance making calls to https://github.com/rpng/open_vins/blob/58afb2daf47498911100edc7452ccd7e42a2db50/ov_msckf/src/ros/ROS2Visualizer.cpp#L499

even though I am not doing stereo per-se.

So I followed suit, and created a new branch for three cameras (pretty much as suggested here https://github.com/rpng/open_vins/issues/285) , and I have similar good VIO performance as with two cameras, but in RVIZ, I only see the features (active, MSCKF and SLAM) of the first two cameras, making me wonder if the third camera is being used at all, or maybe I need to set the number of features larger?

I am still wiggling my way through the code, but I would like to know if MIMC-VINS is an absolute requirement for when using more than two cameras at the moment, or is it some entirely different approach (I have not yet looked at MIMC-VINS), and open-vins in its current condition could support three cameras, but not just in the method of MIMC-VINS?

goldbattle commented 1 year ago

MIMC-VINS main focus was on improving robustness and supporting sensor calibration of all parameters. I believe you shouldn't run into to many problems if you are leveraging synchronized cameras (have you verified this by performing Kalibr calibration and comparing the recovered time offsets of each camera?).

One thing that will likely need to be improved is that SLAM features are initialized based on just having long track length. For multi-camera, it is better to ensure there are some SLAM features in each camera. This probably isn't something you need to worry about in the beginning. https://github.com/rpng/open_vins/blob/58afb2daf47498911100edc7452ccd7e42a2db50/ov_msckf/src/core/VioManager.cpp#L428-L441

I get very poor VIO performance. ... I have similar good VIO performance as with two cameras, but in RVIZ, I only see the features (active, MSCKF and SLAM) of the first two cameras, making me wonder if the third camera is being used at all, or maybe I need to set the number of features larger?

Is this in terms of accuracy, computational, or something else? Are you able to expand on this? You are saying there are no updates with the third camera? This shouldn't be the case unless there are so many features, the third camera tracks are just not used.

If you see that all cameras are having a good number of feature tracks, then likely there are too many features and maybe the max_slam or max_msckf_in_update is limiting what features are used. You can try to track maybe 50 features per image (so then 150 features should be tracked in total) and see if you still have a problem, or play with max_msckf_in_update or increase it to 9999 to use all features whose tracking have been lost.

I also recommend you testing the v2.7 PR as that will soon be merged with quite a few changes: https://github.com/rpng/open_vins/pull/337

Galto2000 commented 1 year ago

Thank you for your swift reply.

I am using Gazebo, and I have an environment that has nice textures for feature tracking. Since the drone is from own design, I have perfect knowledge of the location of the cameras w.r.t. the IMU, so I didn't use Kalibr for the camera calibration - I wore a python script that generated the same outputs as Kalibr computed from this truth knowledge.

I am using the "standard" ROS Gazebo packages, and run its camera plugin in multicamera mode (so it's a calling a single plugin sharable object, but with parameters that define multiple cameras, typically used for simulating stereo cameras) but I actually haven't verified if this plugin truly generates synchronized images - it's on my todo list.

The simulated drone has one camera to due left and 45 degrees down, one forward looking camera 45 degrees down, and one camera due right and 45 degrees down.

I created a Gazebo plugin that publishes the truth poses. The model is controlled by replaying a recorded bag of control inputs, so each experiment is executed exactly the same, and I am recording the open-vins outputs in a rosbag along with the truth. I have a python script that then analyzes the results in comparison with the truth, creates plots and outputs metrics such as ATE and relative error. Also, at the end of replaying the recorded input, the drone remains perfectly suspended in space (simulated alien technology ;) ).

When using three cameras, and without modifying any code in ROS2Visualizer, I get a significant larger ATE and relative error, and at the end, when the input controls stop, and the drone is perfectly suspended, the open-vins solution starts to drift slightly. I am not seeing this drift at the end with a single camera, or with two cameras (any combination, e.g. left and right camera).

Also, qualitatively checking the truth vs estimated plots, when using three cameras in the unaltered version of ROS2Visualizer, the plots are much closer for the monocular and binocular case in comparison to the triocular case.

When I alter ROS2Visualizer by adding a case for three cameras, like so:

  // Logic for sync stereo subscriber
  // https://answers.ros.org/question/96346/subscribe-to-two-image_raws-with-one-function/?answer=96491#post-id-96491
  if (_app->get_params().state_options.num_cameras == 2) 
  {
    PRINT_INFO("\n\n<======================= 2 cameras =======================>\n\n\n");
    // Read in the topics
    std::string cam_topic0, ca
![Screenshot from 2023-05-22 13-08-39](https://github.com/rpng/open_vins/assets/19415965/00a99231-f824-4343-a4f7-c07b6f78380b)
m_topic1;
    _node->declare_parameter<std::string>("topic_camera" + std::to_string(0), "/cam" + std::to_string(0) + "/image_raw");
    _node->get_parameter("topic_camera" + std::to_string(0), cam_topic0);
    _node->declare_parameter<std::string>("topic_camera" + std::to_string(1), "/cam" + std::to_string(1) + "/image_raw");
    _node->get_parameter("topic_camera" + std::to_string(1), cam_topic1);

    parser->parse_external("relative_config_imucam", "cam" + std::to_string(0), "rostopic", cam_topic0);
    parser->parse_external("relative_config_imucam", "cam" + std::to_string(1), "rostopic", cam_topic1);

    // Create sync filter (they have unique pointers internally, so we have to use move logic here...)
    auto image_sub0 = std::make_shared<message_filters::Subscriber<sensor_msgs::msg::Image>>(_node, cam_topic0);
    auto image_sub1 = std::make_shared<message_filters::Subscriber<sensor_msgs::msg::Image>>(_node, cam_topic1);
    auto sync = std::make_shared<message_filters::Synchronizer<sync_pol>>(sync_pol(10), *image_sub0, *image_sub1);
    sync->registerCallback(std::bind(&ROS2Visualizer::callback_stereo, this, std::placeholders::_1, std::placeholders::_2, 0, 1));

    // sync->registerCallback([](const sensor_msgs::msg::Image::SharedPtr msg0, const sensor_msgs::msg::Image::SharedPtr msg1)
    // {callback_stereo(msg0, msg1, 0, 1);});
    // sync->registerCallback(&callback_stereo2); // since the above two alternatives fail to compile for some reason
    // Append to our vector of subscribers
    sync_cam.push_back(sync);
    sync_subs_cam.push_back(image_sub0);
    sync_subs_cam.push_back(image_sub1);

    PRINT_INFO("subscribing to cam (stereo): %s\n", cam_topic0.c_str());
    PRINT_INFO("subscribing to cam (stereo): %s\n", cam_topic1.c_str());
  } 

  // I am following suit here from the example above for two cameras and making calls to callback_stereo(), but then for three cameras.
  //
  else if (_app->get_params().state_options.num_cameras == 3)
  {
    PRINT_INFO("\n\n<======================= 3 cameras =======================>\n\n\n");
    // Read in the topics
    std::string cam_topic0, cam_topic1, cam_topic2;
    _node->declare_parameter<std::string>("topic_camera" + std::to_string(0), "/cam" + std::to_string(0) + "/image_raw");
    _node->get_parameter("topic_camera" + std::to_string(0), cam_topic0);
    _node->declare_parameter<std::string>("topic_camera" + std::to_string(1), "/cam" + std::to_string(1) + "/image_raw");
    _node->get_parameter("topic_camera" + std::to_string(1), cam_topic1);
    _node->declare_parameter<std::string>("topic_camera" + std::to_string(2), "/cam" + std::to_string(2) + "/image_raw");
    _node->get_parameter("topic_camera" + std::to_string(2), cam_topic2);

    parser->parse_external("relative_config_imucam", "cam" + std::to_string(0), "rostopic", cam_topic0);
    parser->parse_external("relative_config_imucam", "cam" + std::to_string(1), "rostopic", cam_topic1);
    parser->parse_external("relative_config_imucam", "cam" + std::to_string(2), "rostopic", cam_topic2);

    // Create sync filter (they have unique pointers internally, so we have to use move logic here...)
    auto image_sub0 = std::make_shared<message_filters::Subscriber<sensor_msgs::msg::Image>>(_node, cam_topic0);
    auto image_sub1 = std::make_shared<message_filters::Subscriber<sensor_msgs::msg::Image>>(_node, cam_topic1);
    auto image_sub2 = std::make_shared<message_filters::Subscriber<sensor_msgs::msg::Image>>(_node, cam_topic2);

    auto sync = std::make_shared<message_filters::Synchronizer<sync_pol3>>(sync_pol3(10), 
                                                                           *image_sub0, 
                                                                           *image_sub1, 
                                                                           *image_sub2);
    sync->registerCallback(std::bind(&ROS2Visualizer::callback_cam_triad, this, 
                          std::placeholders::_1, std::placeholders::_2, std::placeholders::_2, 
                          0, 1, 2));

    // Append to our vector of subscribers
    sync_cam3.push_back(sync);
    sync_subs_cam.push_back(image_sub0);
    sync_subs_cam.push_back(image_sub1);
    sync_subs_cam.push_back(image_sub2);
    PRINT_INFO("subscribing to cam (triad): %s\n", cam_topic0.c_str());
    PRINT_INFO("subscribing to cam (triad): %s\n", cam_topic1.c_str());
    PRINT_INFO("subscribing to cam (triad): %s\n", cam_topic2.c_str());
  } 
  else 
  {
    PRINT_INFO("\n\n<======================= mono cameras =======================>\n\n\n");
    PRINT_INFO("Got %d mono cameras to subscribe to \n", _app->get_params().state_options.num_cameras);
    // Now we should add any non-stereo callbacks here
    for (int i = 0; i < _app->get_params().state_options.num_cameras; i++) {
      // read in the topic
      std::string cam_topic;
      _node->declare_parameter<std::string>("topic_camera" + std::to_string(i), "/cam" + std::to_string(i) + "/image_raw");
      _node->get_parameter("topic_camera" + std::to_string(i), cam_topic);
      parser->parse_external("relative_config_imucam", "cam" + std::to_string(i), "rostopic", cam_topic);
      // create subscriber
      // auto sub = _node->create_subscription<sensor_msgs::msg::Image>(
      //    cam_topic, rclcpp::SensorDataQoS(), std::bind(&ROS2Visualizer::callback_monocular, this, std::placeholders::_1, i));
      auto sub = _node->create_subscription<sensor_msgs::msg::Image>(
          cam_topic, 10, [this, i](const sensor_msgs::msg::Image::SharedPtr msg0) { callback_monocular(msg0, i); });
      subs_cam.push_back(sub);
      PRINT_INFO("subscribing to cam (mono): %s\n", cam_topic.c_str());
    }
  }
}

where

void ROS2Visualizer::callback_cam_triad(const sensor_msgs::msg::Image::ConstSharedPtr msg0, 
                                        const sensor_msgs::msg::Image::ConstSharedPtr msg1, 
                                        const sensor_msgs::msg::Image::ConstSharedPtr msg2, 
                                        int cam_id0,
                                        int cam_id1,
                                        int cam_id2)
  {
    // Check if we should drop this image
    double timestamp = msg0->header.stamp.sec + msg0->header.stamp.nanosec * 1e-9;
    double time_delta = 1.0 / _app->get_params().track_frequency;
    if (camera_last_timestamp.find(cam_id0) != camera_last_timestamp.end() && timestamp < camera_last_timestamp.at(cam_id0) + time_delta) {
      return;
    }
    camera_last_timestamp[cam_id0] = timestamp;

    // Get the image
    cv_bridge::CvImageConstPtr cv_ptr0;
    try {
      cv_ptr0 = cv_bridge::toCvShare(msg0, sensor_msgs::image_encodings::MONO8);
    } catch (cv_bridge::Exception &e) {
      PRINT_ERROR("cv_bridge exception: %s", e.what());
      return;
    }

    // Get the image
    cv_bridge::CvImageConstPtr cv_ptr1;
    try {
      cv_ptr1 = cv_bridge::toCvShare(msg1, sensor_msgs::image_encodings::MONO8);
    } catch (cv_bridge::Exception &e) {
      PRINT_ERROR("cv_bridge exception: %s", e.what());
      return;
    }

    // Get the image
    cv_bridge::CvImageConstPtr cv_ptr2;
    try {
      cv_ptr2 = cv_bridge::toCvShare(msg2, sensor_msgs::image_encodings::MONO8);
    } catch (cv_bridge::Exception &e) {
      PRINT_ERROR("cv_bridge exception: %s", e.what());
      return;
    }

    // Create the measurement
    ov_core::CameraData message;
    message.timestamp = cv_ptr0->header.stamp.sec + cv_ptr0->header.stamp.nanosec * 1e-9;
    message.sensor_ids.push_back(cam_id0);
    message.sensor_ids.push_back(cam_id1);
    message.sensor_ids.push_back(cam_id2);

    message.images.push_back(cv_ptr0->image.clone());
    message.images.push_back(cv_ptr1->image.clone());
    message.images.push_back(cv_ptr2->image.clone());

    if (_app->get_params().use_mask) {
      message.masks.push_back(_app->get_params().masks.at(cam_id0));
      message.masks.push_back(_app->get_params().masks.at(cam_id1));
      message.masks.push_back(_app->get_params().masks.at(cam_id2));
    } else {
      // message.masks.push_back(cv::Mat(cv_ptr0->image.rows, cv_ptr0->image.cols, CV_8UC1, cv::Scalar(255)));
      message.masks.push_back(cv::Mat::zeros(cv_ptr0->image.rows, cv_ptr0->image.cols, CV_8UC1));
      message.masks.push_back(cv::Mat::zeros(cv_ptr1->image.rows, cv_ptr1->image.cols, CV_8UC1));
      message.masks.push_back(cv::Mat::zeros(cv_ptr2->image.rows, cv_ptr2->image.cols, CV_8UC1));
    }

    // append it to our queue of images
    std::lock_guard<std::mutex> lck(camera_queue_mtx);
    camera_queue.push_back(message);
    std::sort(camera_queue.begin(), camera_queue.end());
  }

Note that I had to slightly modify https://github.com/rpng/open_vins/blob/58afb2daf47498911100edc7452ccd7e42a2db50/ov_msckf/src/ros/ROS2Visualizer.cpp#L400

like so in order to make it work:

    size_t num_unique_cameras = (params.state_options.num_cameras == 2 || params.state_options.num_cameras == 3) ? 1 : params.state_options.num_cameras;

In that case, for the three-camera case, both the quantitative and qualitative analysis show similar performance as monocular or binocular, in that at the end, when it is suspended, there is no drift. But in RVIZ, it only visualized the feature points from the first two cameras (i.e. left and front)

In the figure below it shows the left and front features, but none of the right camera.

Screenshot from 2023-05-22 13-08-39

Thank you for the tips, I will further explore those.

goldbattle commented 1 year ago

Oh I see what you are doing. Yes, it will perform worst if you do not have the sync'ed subscriber implemented since we have a fixed clone rate. This is because the system will clone at the camera rate (e.g. 60hz for three 20hz cameras). This will likely hurt performance since the sliding window has a fixed clone size of 11 clones, thus this window which originally was spread over 0.5 seconds at 20hz is now over 0.1 seconds at the 60hz. This means there is less disparity, and thus poorer feature estimates.

You could try increasing the number of clones to be 3x your number of cameras, but in reality you should feed in images at 20hz, with 3 images in each ov_data::CameraData structure. This is why we have custom logic for the stereo and treat "binocular" as a sync'ed stereo so the sliding window time remains around 0.5 seconds.

For the point cloud, I believe there is a check that the feature needs to be seen in front of the cam0 since downstream loop-closure only works on monocular images. I think fixing this will be a bit more involved (feel free to open an issue so I can take a look sometime in the future). https://github.com/rpng/open_vins/blob/58afb2daf47498911100edc7452ccd7e42a2db50/ov_msckf/src/core/VioManagerHelper.cpp#L343-L346

Your expectation should be improved robustness and similar accuracy as you add more binocular cameras. This table and the discussion in Section VI.B from MIMC-VINS might be of interested to you:

rpng / open_vins

Multiple camera open sourcing #335