Computational speed and efficieny of DLO

Hello! Thanks for the amazing work! This is really useful. I am currently trying to run your program with a single core (no multi threading with OpenMP as done in NanoGICP code). However, the CPU Usage and RAM Allocation is really high. I wonder if this is because the keyframes are continuously stored and no keyframes are ever discarded in this->keyframes. I understand that storing all keyframes is helpful to retrieve far back frames for convex/concave hull and also for nearest neighbors in case of a loop maneuver for example but for computation purposes, I was wondering if there are scenarios that you have tested on where the inclusion of convex and concave hull and/or addition of nearest neighbor keyframes gave you a major improvement in the output (map and trajectory) in comparison to taking the latest k frames for example (allowing me to discard old keyframes and just keep the past few keyframes in log). Do you have any other recommendations/suggestions for running the algorithm on one core ? P.S. I have tried different downsampling for the source cloud and the submap but with not much luck. Thanks in advance!

Hi @surabhi96 --

Thanks for the interest in our work. Yeah we definitely observed that doing nearest neighbor keyframes + hull keyframes helped significantly reduce long-term drift in case of those "loop maneuvers" you mentioned. DLO was written with at least four cores in mind, so everything is highly parallelized to prevent any bottlenecks in the odometry for these lightweight platforms.

That being said though, there are certainly a few things you could do to help reduce RAM and CPU usage when running on a single core. Regarding RAM, it might be worthwhile to prune some of the keyframes somehow (i.e. keep only the latest k keyframes as you said). You'll probably see a reduction in accuracy, but depending on your use case that might be fine. You can also try playing around with the adaptive thresholds so it doesn't place as many keyframes in the environment (or turn it off altogether and having a static theshold). Increasing the voxelization will also help with RAM since it'll be saving less points per cloud into memory (although you've mentioned this already).

For CPU usage, it's a bit tough because NanoGICP heavily relies on multiple cores to perform scan-matching. Try playing around with the number of GICP iterations (perhaps reduce it down to 5 for S2S and S2M?) and also the number of correspondences. If you're not already, try using an IMU so that it primes the optimization with a (hopefully good) initialization point, which would reduce the number of iterations needed for convergence.

Those are some lower hanging fruits that you could try that immediately come to mind. I'll try thinking of anything else you could perhaps do. Let me know if that helps.

vectr-ucla / direct_lidar_odometry

Computational speed and efficieny of DLO #25