koide3 / fast_gicp

A collection of GICP-based fast point cloud registration algorithms
BSD 3-Clause "New" or "Revised" License
1.24k stars 316 forks source link

KDTree creation takes much time for big clouds #60

Open Alevs2R opened 3 years ago

Alevs2R commented 3 years ago

Problem

I discovered that KDtree creation takes significantly more time when aligning with a big target cloud (with a map).

KDtree creation takes only 3ms for clouds with 17k points, but it takes 90ms for 370k points (my small 100x100 meters filtered map).

PCL creates kdtree inside "align" method, so it means that ndt_cuda (P2D) runs for 98ms instead of 8ms during first run.

Moreover, PCL spends 20-30ms to calculate fitness score, regardless of the size of the target map. It is also not very good 😁.

Detailed benchmark results **Aligning big target cloud with kdtree creation** ``` target:364444[pts] source:18564[pts] --- fgicp_st --- single:2293.17[msec] --- fgicp_mt --- single:460.209[msec] --- vgicp_st --- single:2132.44[msec] --- vgicp_mt --- single:381.7[msec] --- ndt_cuda (P2D) --- single:98.3882[msec] --- ndt_cuda (D2D) --- single:93.0577[msec] --- vgicp_cuda (parallel_kdtree) --- single:340.319[msec] --- vgicp_cuda (gpu_bruteforce) --- single:9404.37[msec] --- vgicp_cuda (gpu_rbf_kernel) --- terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: cudaErrorMemoryAllocation: out of memory Aborted (core dumped) ``` **Aligning big target cloud without kdtree creation** ``` target:364444[pts] source:18564[pts] --- fgicp_st --- single:2224.49[msec] --- fgicp_mt --- single:312.474[msec] --- vgicp_st --- single:2044.39[msec] --- vgicp_mt --- single:293.93[msec] --- ndt_cuda (P2D) --- single:8.7521[msec] --- ndt_cuda (D2D) --- single:7.81053[msec] --- vgicp_cuda (parallel_kdtree) --- single:246.274[msec] --- vgicp_cuda (gpu_bruteforce) --- single:9536.63[msec] --- vgicp_cuda (gpu_rbf_kernel) --- terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: cudaErrorMemoryAllocation: out of memory Aborted (core dumped) ``` **Aligning default clouds with kdtree creation** ``` --- fgicp_st --- single:263.107[msec] 100times:26208.3[msec] 100times_reuse:17901.3[msec] fitness_score:0.204374 --- fgicp_mt --- single:53.8672[msec] 100times:3846.4[msec] 100times_reuse:2469.65[msec] fitness_score:0.204384 --- vgicp_st --- single:185.224[msec] 100times:18208.6[msec] 100times_reuse:9911.83[msec] fitness_score:0.205022 --- vgicp_mt --- single:54.0197[msec] 100times:2933.43[msec] 100times_reuse:1687.04[msec] fitness_score:0.205022 --- ndt_cuda (P2D) --- single:8.74302[msec] 100times:861.372[msec] 100times_reuse:499.426[msec] fitness_score:0.197207 --- ndt_cuda (D2D) --- single:6.07843[msec] 100times:587.673[msec] 100times_reuse:315.464[msec] fitness_score:0.198976 --- vgicp_cuda (parallel_kdtree) --- single:31.5053[msec] 100times:2363.01[msec] 100times_reuse:992.915[msec] fitness_score:0.204697 --- vgicp_cuda (gpu_bruteforce) --- single:45.3529[msec] 100times:3512.59[msec] 100times_reuse:1685.24[msec] fitness_score:0.204693 --- vgicp_cuda (gpu_rbf_kernel) --- single:7.57619[msec] 100times:750.764[msec] 100times_reuse:309.824[msec] fitness_score:0.204584 ``` **Aligning default clouds without kdtree creation** ``` --- fgicp_st --- single:260.423[msec] 100times:25678.1[msec] 100times_reuse:17731.4[msec] --- fgicp_mt --- single:82.7624[msec] 100times:3546.36[msec] 100times_reuse:2466.41[msec] --- vgicp_st --- single:182.533[msec] 100times:17876.4[msec] 100times_reuse:9981.22[msec] --- vgicp_mt --- single:36.4721[msec] 100times:2744.13[msec] 100times_reuse:1607.6[msec] --- ndt_cuda (P2D) --- single:6.20965[msec] 100times:558.324[msec] 100times_reuse:472.443[msec] --- ndt_cuda (D2D) --- single:3.04992[msec] 100times:306.246[msec] 100times_reuse:303.85[msec] --- vgicp_cuda (parallel_kdtree) --- single:37.2568[msec] 100times:1809.56[msec] 100times_reuse:1083.63[msec] --- vgicp_cuda (gpu_bruteforce) --- single:42.7751[msec] 100times:3270.32[msec] 100times_reuse:1685.56[msec] --- vgicp_cuda (gpu_rbf_kernel) --- single:4.78256[msec] 100times:472.186[msec] 100times_reuse:306.657[msec] ```

Simple solution

My solution is to set empty kdtree as search method and enable force_no_recompute flag before calling "align". \ Like this:

pcl::search::KdTree<pcl::PointXYZ>::Ptr empty_kdtree(new pcl::search::KdTree<pcl::PointXYZ>);
reg.setSearchMethodTarget(empty_kdtree, true);

You can easily repeat my experiment in fork, branch test/disable-kdtree. Command:

build/gicp_align data/saved_map_cloud.pcd data/saved_scan.pcd

Better solution

Better solution would be to replace PCL default kdtree with something more efficient 😊

koide3 commented 3 years ago

Thanks for letting me know this issue and providing nice benchmark results. When I wrote the code, I noticed that pcl::Registration creates kdtree inside, but I thought its construction cost would be negligible. I'll disable the kdtree creation with force_no_recompute.