Visual odoemetry and mapping mechanism with multiple rgbd cameras

szx0112 commented 4 years ago

Hi, RTABMAP is a great software and we found it works robustly in the single rgbd camera condition.

Recently, I found RTABMAP can do SLAM with multiple sync rgb-d cameras (d435 or kinects for example) by reading the demo launch file demo_two_kinects.launch

However, I am still confusing the mechanism of visual odometry and mapping behind it. Is the multi-cam slam employed in a master-slave manner (for example only the odometry of master camera is used), or as a parallel tracking fashion ( (a) at least one camera can observe the salient feature to track or (b) each camera computes its own pose in each frame then fuse those poses together). Currently, we do not found any discussion about it yet.

Understanding such mechanism can help us better implement RTABMAP in our application since we may face a lot of camera obstructions and illumination change during the mapping procedure.

Any comment is appreciated.

matlabbe commented 4 years ago

It uses both cameras at the same time in visual motion estimation. It is why it is preferable that cameras can be hardware synchronized (I think D400 series can have an external sync signal). With multiple cameras, 3D->3D motion estimation is done (PCL 3D points correspondence minimization), instead of default 2D->3D motion estimation (PnP). To use PnP, we would need to reimplement cv::solvePnPRansac() function to accept multiple cameras (I think it is possible with some time).

The cameras don't need to overlap, though this would make be easier to calibrate extrinsics between them, which is the most tricky part for that kind of setup!

szx0112 commented 4 years ago

@matlabbe Thanks for the reply. Registration of 3D integrated point clouds between frames is what we expected. I am happy rtabmap already implement it.

We did an initial test of mapping in an inner pipe surface in the last week. Seems the 3D->3D motion estimation are easier to get lost even when compared to only a single camera is used (with PnP). The pipe has rich (manually created) inner surface texture but very smooth curve shape (3D corner information may not help much in this environment).

We tried to tunning the parameters like use features/optical flow, F2M/F2F, and etc, to help improve the tracking performance. It seems like Rtabmap do not support all of those in the multi-camera setup.

So could you please provide some explanations about what 3D feature is currently supported for multi-camera mapping and what is the basic matching and motion estimation strategy being used? If we could, how could we target those parameters in the 'rtabmap -params' and tunning them in order to increase our performance (majorly tracking ability)?

matlabbe commented 4 years ago

$ rtabmap --params | grep Vis/EstimationType
Param: Vis/EpipolarGeometryVar = "0.02"                    [[Vis/EstimationType = 2] Epipolar geometry maximum variance to accept the transformation.]
Param: Vis/EstimationType = "1"                            [Motion estimation approach: 0:3D->3D, 1:3D->2D (PnP), 2:2D->2D (Epipolar Geometry)]
Param: Vis/InlierDistance = "0.1"                          [[Vis/EstimationType = 0] Maximum distance for feature correspondences. Used by 3D->3D estimation approach.]
Param: Vis/PnPFlags = "0"                                  [[Vis/EstimationType = 1] PnP flags: 0=Iterative, 1=EPNP, 2=P3P]
Param: Vis/PnPRefineIterations = "0"                       [[Vis/EstimationType = 1] Refine iterations. Set to 0 if "Vis/BundleAdjustment" is also used.]
Param: Vis/PnPReprojError = "2"                            [[Vis/EstimationType = 1] PnP reprojection error.]
Param: Vis/RefineIterations = "5"                          [[Vis/EstimationType = 0] Number of iterations used to refine the transformation found by RANSAC. 0 means that the transformation is not refined.]

3D<->3D estimation is Vis/EstimationType=0. The only tuning parameter is Vis/InlierDistance (default 0.1 meters). The approach can be found here. Under the hood, a RANSAC approach is used here to match two point clouds with known correspondences. You should see the 3D->3D approach like matching two point clouds. It is then very dependent on the quality of the 3D position of the features. 3D->2D is more robust to bad depth values, with SBA available to refine the 3D poses of the estimated features.

introlab / rtabmap_ros

Visual odoemetry and mapping mechanism with multiple rgbd cameras #384