Regenerate processing time graph for Rtabmap process

introlab / rtabmap

RTAB-Map library and standalone application

https://introlab.github.io/rtabmap

Other

2.83k stars 787 forks source link

Regenerate processing time graph for Rtabmap process #1336

Open naitiknakrani-eic opened 2 months ago

naitiknakrani-eic commented 2 months ago

Hi @francoisferland @matlabbe,

Can you help me to regenerate processing time logs for creating graph as shown in figure below?

Any tools will be helpful ? If tools are not available, can you tell methodology used to derive these graph ?

Thanks

matlabbe commented 2 months ago

After you created a database, you can extract all timing statistics and export them to MATLAB or Octave to show a graph like that. To do so, you can see the available statistics in the database with:

rtabmap-report --stats rtabmap.db

To show the statistics of the figure above (respectively to legend, the names may slightly differ):

rtabmap-report \
   Timing/Memory_update/ms \
   Timing/Proximity_by_space/ms \
   Timing/Posterior_computation/ms \
   Timing/Map_optimization/ms \
   RtabmapROS/TimeUpdatingMaps/ms \
   Timing/Reactivation/ms \
   rtabmap.db

In the figure that will open, you can already see something similar than the figure above. To export the data, right-click on the legend area and click "Copy all curve data to clipboard".

Paste data in a text file, then import in any data analyzing software you want.

x   Timing/Memory_update/ms Timing/Proximity_by_space/ms    Timing/Posterior_computation/ms Timing/Map_optimization/ms  RtabmapROS/TimeUpdatingMaps/ms  Timing/Reactivation/ms  
0.000000    6.505010    0.001907    0.012875    0.025034    NaN 0.000000
1.004863    11.481000   0.000954    0.005960    0.011921    NaN 0.005960
2.071385    8.010860    0.000000    0.006199    0.012875    NaN 0.001907
3.072435    7.767920    0.000000    0.005960    4.143950    NaN 1.037120
4.072935    10.537100   0.000954    0.008106    0.799894    NaN 0.039101
5.073380    10.127100   0.001192    0.005007    0.937939    NaN 0.032902
6.073429    10.460100   0.000954    0.007868    1.488920    NaN 0.041962
7.139824    10.862100   0.000954    0.005960    1.379010    NaN 0.034094
8.139961    7.934090    0.000954    0.005960    1.073120    NaN 0.035048
9.140281    8.811000    0.000954    0.006914    1.636030    NaN 0.039101
10.140464   7.201910    0.000954    0.007868    1.781940    NaN 0.049830
11.140506   8.593800    0.000954    0.010014    2.104040    NaN 0.061989
12.206952   8.467200    0.000954    0.020981    1.698020    NaN 0.044823
13.273479   10.787000   0.000954    0.025034    2.268080    NaN 0.051975
...

Note that in this case that map didn't have "Map Assembling Time" (or RtabmapROS/TimeUpdatingMaps/ms), I took a random database, it may not have been taken on a robot with map assembling enabled.

naitiknakrani-eic commented 2 months ago

@matlabbe Thanks. This helps a lot. For our database, we are getting graph like this,

We have Map assembling time, but its very less hence invisible in the graph. Below one is separate graph of Map assembling.

From this graph, it seems Global map assembling is not time consuming process. Is it on expected line ? You mentioned in other thread as well as in paper, that Map assembling is computationally heavy process. Are we missing something ?

Also, we did detailed time profile of STM, and other modules but can't able to find which function does Global map assembling. Is it part of Rtabmap.cpp or Corewrapper.cpp ? Can you help us pointing that function ?

matlabbe commented 2 months ago

You mentioned in other thread as well as in paper, that Map assembling is computationally heavy process

... (and more precisely) when there is a loop closure and graph deformation is over GridGlobal/UpdateError (default=0.01m).

The global map assembling is called from the MapsManager called from here, but the actual implementation is in those files: https://github.com/introlab/rtabmap/tree/master/corelib/src/global_map (depending on the topic we are subscribing). The global maps are not generated if no one is subscribed on one of them.

The biggest update spikes will happen when the map has to be fully updated because of a loop closure or memory management retrieved past data back in the current WM (or more explicitly when this is true). Depending on the GlobalMap flavor, the full update may take more or less time (e..g, 2d maps are relatively fast to regenerate). In this paper Table 10, the last two columns show a comparison depending on the map used and if a loop closure happened or not. As you can see, when there is no loop closure, appending new local grids to global map is relatively fast (maybe what you are seeing).

samarth-eic commented 1 month ago

Hi @matlabbe, continuing with above issues we kept different decimation on pointcloud, coming from pointcloudXYZ node with decimation kept as 1, 2 and 4. This can be viewed in below graphs

Screenshot 2024-09-24 162036

For decimation 1 and decimation 2 the neighbors value are 105 and 292 respectively, shouldn't be it like for more data points neighbors should be more as compared to case for decimation 2 and decimation 4, can you help us with reason for which this behavior can be seen here.

matlabbe commented 1 month ago

The number of neighbor links are roughly the same number of nodes added to the graph. I don't now what Rtabmap/DetectionRate you are using and actual processing time per node, but if the processing time was over the update rate in decimation 1, that could make sense that with decimation 2 there are more nodes because the processing time would be lower (higher rate to add nodes). The difference between decimation 2 and decimation 4 I cannot really say without seeing the actual processing time. If you can share the three databases, that could be useful.

samarth-eic commented 1 month ago

@matlabbe Thanks, is Rtabmap/UpdateRate same as Rtabmap/DetectionRate which is 30 in all three cases, for processing time Below is the attached zip for three databases, hope this could be useful here https://file.io/iUN8rPnCqU3S

naitiknakrani-eic commented 1 month ago

@matlabbe If Memory management is disabled, and actual processing time exceeds the Rtabmap/detectionrate, then what happen ?

Does it skip the node creation ? and hence the nodes created are less in number ?
Does it start transferring nodes to LTM, i.e. does memory management job ?

matlabbe commented 1 month ago

@samarth-eic Yes, Rtabmap/DetectionRate is the update rate I referred. I corrected the typo in my previous post.

Like what I thought, the processing time is high, so it cannot process in real-time at rate of 30 Hz (Rtabmap/DetectionRate). Here is a breakdown of the computation on the first database (from your computer): Screenshot from 2024-09-29 15-05-35

Rtabmap/DetectionRate is not meant to be used at 30 Hz, as updating the map is likely to be higher than 33 ms. The default is 1 Hz. If you want more poses without increasing Rtabmap/DetectionRate, you can set Rtabmap/CreateIntermediateNodes at true (there is no benefice to save more poses, that parameter is used mostly just to compare with benchmark datasets providing poses for every frame).

A lot of time is used to create the local grids, that could be optimized like this:

Grid/ClusterRadius=0.1 instead of 1 (for node 39, 600 ms -> 20 ms)
Grid/NormalsSegmentation=false instead of true (20 ms -> 10 ms) (with Grid/MaxObstacleHeight=1 and Grid/MaxGroundHeight=0.1)
Grid/NoiseFilteringRadius=0 instead of0.07 (10 ms -> <1 ms)

Example by reprocessing on my computer with these values:

rtabmap-reprocess \
   --Grid/ClusterRadius 0.1 \
   --Grid/NormalsSegmentation false \
   --Grid/MaxObstacleHeight 1 \
   --Grid/MaxGroundHeight 0.1 \
   --Grid/NoiseFilteringRadius 0 \
   radius_decimation1.db output.db

Screenshot from 2024-09-29 15-21-04

To improve proximity detection time, if you don't care not keeping the raw laser scans, is to downsample the scans with --Mem/LaserScanVoxelSize 0.05 and precompute the normals (--Mem/LaserScanNormalK 20). Also in general for 3D point clouds I use --Icp/Iterations 10. If computing ressources are limited, set also --RGBD/ProximityMaxPaths 1 and disable one-to-many detection by setting --RGBD/ProximityPathMaxNeighbors 0:

rtabmap-reprocess \
   --Grid/ClusterRadius 0.1 \
   --Grid/NormalsSegmentation false \
   --Grid/MaxObstacleHeight 1 \
   --Grid/MaxGroundHeight 0.1 \
   --Grid/NoiseFilteringRadius 0 \
   --Mem/LaserScanVoxelSize 0.05 \
   --Mem/LaserScanNormalK 20 \
   --Icp/Iterations 10 \
   --RGBD/ProximityMaxPaths 1 \
   --RGBD/ProximityPathMaxNeighbors 0 \
   radius_decimation1.db output.db

Screenshot from 2024-09-29 15-42-57

@naitiknakrani-eic With memory management disabled or not, input data are always dropped between each rtabmap updates (i.e., the rate is limited either by Rtabmap/DetectionRate or actual maximum rate rtabmap can process based on the machine). When memory management is disabled, if processing time increases over time, there will be more and more space between nodes in the graph. With memory management enabled, nodes in the graph will be uniformly spaced out based on Rtabmap/DetectionRate, but some nodes may be transferred to LTM and not available in the current local map.

naitiknakrani-eic commented 1 month ago

@matlabbe That's fantastic to know how these parameters really help in optimizing computation time.

"the rate is limited either by Rtabmap/DetectionRate or actual maximum rate rtabmap can process based on the machine", Like this is the reason why we kept it at 30 in order to achieve as max throughput as possible. Keeping detection rate at 1, actually underutilized processing capabilities of NVIDIA boards.

matlabbe commented 1 month ago

"the rate is limited either by Rtabmap/DetectionRate or actual maximum rate rtabmap can process based on the machine", Like this is the reason why we kept it at 30 in order to achieve as max throughput as possible. Keeping detection rate at 1, actually underutilized processing capabilities of NVIDIA boards.

Here some reasons why we set Rtabmap/DetectionRate at 1 Hz by default:

In most situations, we don't need more nodes added to map. This saves RAM, hard-drive, CPU time... for other stuff
Processing time of rtabmap also increases with the map size, in particular loop closure detection time, graph optimization (after a loop closure), global occupancy grid updates... the more nodes there are, the longer it takes to update. So the lower the Rtabmap/DetectionRate is, the bigger the map (in space) can be processed online without any memory management.
Ideally, we want hundred of nodes in the map that can be updated online (by doing optimizations like above) before starting to think about enabling memory management of rtabmap (which has some quirks to be aware of before enabling it).

Where you would want to put more computation resources first is the visual odometry or lidar odometry, for which higher frequency usually means better accuracy and lower drift. Then equally important, robot controller and obstacle detection with low latency. Then lastly, re-localization/mapping.

cheers, Mathieu

naitiknakrani-eic commented 1 month ago

Hi @matlabbe Thanks for your explanation and sorry for delay in revert back. We were understanding code and logic for NormalSegmentation. It was surprise to see that the Rtabmap works and create occupancy with keeping NormalSegmentation = off. It is a process which takes lot of CPU resources and contributes to ~60% of total timing (Refer First Figure 1 of early response, Timing/total avg - 1266 ms and Timing/occupancy - 781 ms).

This parameter skips the following function from file "rtabmap/corelib/include/rtabmap/core/impl /util3d_mapping.hpp":

if(segmentFlatObstacles && flatSurfaces->size())
        {
            int biggestFlatSurfaceIndex;
            std::vector<pcl::IndicesPtr> clusteredFlatSurfaces = extractClusters(
                    cloud,
                    flatSurfaces,
                    clusterRadius,
                    minClusterSize,
                    std::numeric_limits<int>::max(),
                    &biggestFlatSurfaceIndex);

            ...

Can you please tell us in what scenario the segmentation is required and must ? what was original thought process behind implementing it ?

matlabbe commented 1 month ago

The normal segmentation is based on that experiment: http://wiki.ros.org/rtabmap_ros/Tutorials/StereoOutdoorNavigation#Global_costmap :

Since the ground is not even, the ground is segmented by normal filtering: all points with normal in the +z direction (+- fixed angle) are labelled as ground, all the others are labelled as obstacles. The images below show an example.

We enable it by default because it is more safe in general and easier to use to detect small obstacles. When we disable it, we do a simple passthrought filtering at some height (e..g, 10 cm) over the ground, which is very fast to do, but requires more tuning to detect small obstacles and cannot handle a ground that is not perfectly flat (e.g., ramps or slopes). Here another example from this paper:

For less-powerful computers and/or the robot is moving always on a flat floor, disabling it can give a good boost of performance.

naitiknakrani-eic commented 3 days ago

To improve proximity detection time, if you don't care not keeping the raw laser scans, is to downsample the scans with --Mem/LaserScanVoxelSize 0.05 and precompute the normals (--Mem/LaserScanNormalK 20). Also in general for 3D point clouds I use --Icp/Iterations 10. If computing ressources are limited, set also --RGBD/ProximityMaxPaths 1 and disable one-to-many detection by setting --RGBD/ProximityPathMaxNeighbors 0:
rtabmap-reprocess \
   --Grid/ClusterRadius 0.1 \
   --Grid/NormalsSegmentation false \
   --Grid/MaxObstacleHeight 1 \
   --Grid/MaxGroundHeight 0.1 \
   --Grid/NoiseFilteringRadius 0 \
   --Mem/LaserScanVoxelSize 0.05 \
   --Mem/LaserScanNormalK 20 \
   --Icp/Iterations 10 \
   --RGBD/ProximityMaxPaths 1 \
   --RGBD/ProximityPathMaxNeighbors 0 \
   radius_decimation1.db output.db

@matlabbe Interesting point to note, with your given parameter setting the average Timing/Total/ms is just ~25 ms but it does not reflect in higher publishing rate for /MapData or /Map topic. We hardly get 4 to 5 FPS by checking "ros2 topic hz /mapData". Is it because of inherent limitation of DDS or something else we are missing? Our efforts in reducing computational time inside RTABMap is not translating to higher MapData publish rate.

naitiknakrani-eic commented 3 days ago

We benchmarked mapping for two devices AGX orin and Orin nano on the same recorded bagfile. Here we converted depth into pointcloud with zero ROI and 2 decimation form pointcloud_xyz node. One thing we couldn't understand is the deviation in occupancy map generation. Please see the attached two snapshots:

1. On AGX Orin Find its db file: https://file.io/L6ku7EaRiUNG

2. On Orin Nano Find its db file: https://file.io/vsT9dT3pFFrI

The Orin Nano has more filled grid cells compared to AGX Orin. What could be the possible reason ? Note that both has given same data as in input.