Open cdb0y511 opened 3 years ago
I doubt it may relate to the voxel resolution cause after I increase resolution, it seems to disappear.
Thanks for reporting the issue. I'll check if voxelization has some memory problems soon.
Thanks for reporting the issue. I'll check if voxelization has some memory problems soon.
I strongly suggest you could add more NeighborSearchMethod for the Cuda version. like, setNeighborSearchMethod(fast_gicp::NeighborSearchMethod::DIRECT7); I find DIRECT7 is more robust than DIRECT1 in some scenarios. I think the resolution parameter can be easier to choose when the neighbor voxel is computed. Because the smaller resolution results in fewer points in each voxel, and to consider the neighbor voxels is fair compensation. I am not familiar with thrust. But I think the fast_vgicp_voxel.hpp is a good example. I hope you can make your Cuda version more like it. Thanks,
Yep, it must be good to support more NeighborSearchMethod options in the CUDA version. I'm quite busy with writing some papers, but I try to find spare time to update the VGICP_CUDA code in a few weeks.
I updated VGICP_CUDA so it supports DIRECT7 and DIRECT27. I also added a new neighbor voxel search method DIRECT_RADIUS which uses voxels within a radius so the GPU version can take more voxels into account for registration stability. Please take a look at "fix_vgicp" branch.
Regarding the illegal memory access error, I couldn't reproduce the error on my PC. Can you provide some data so I can check the problem?
Great work! I will try it. The illegal memory access error seems related to the resolution, and I will send some data along with the resolution parameter if the fix_vgicp occurs this error. Thanks,
Above all, thank you for sharing this code.
I also "occasionally" got
GaussianVoxelMap::create_voxelmap 4
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: cudaErrorIllegalAddress: an illegal memory access was encountered
Aborted (core dumped)
error at https://github.com/SMRT-AIST/fast_gicp/blob/master/src/fast_gicp/cuda/gaussian_voxelmap.cu#L242
I added some printf lines as below.
printf("GaussianVoxelMap::create_voxelmap 4\n");
thrust::for_each(
thrust::cuda::par.on(stream),
thrust::make_zip_iterator(thrust::make_tuple(points.begin(), covariances.begin())),
thrust::make_zip_iterator(thrust::make_tuple(points.end(), covariances.end())),
accumulate_points_kernel(voxelmap_info_ptr.data(), buckets, num_points, voxel_means, voxel_covs));
printf("GaussianVoxelMap::create_voxelmap 5\n");
I can find line those error occured by printf('something'). So I'm not sure what is the reason of this issue.
my registration settings
number of points
method : "VGICP_CUDA" downsample_resolution : 0.05 k_correspondences : 30 voxel_resolution : 0.8 max_correspondence_distance : 0.4 neighbor_search_method : "DIRECT_RADIUS" neighbor_search_radius : 0.3
Could you help me for this problem? Thank you.
If target point cloud size is very big,
(int init_num_buckets=8192
) in GaussianVoxelMap
constructor
should be much higher?
Hi @kisung-kim ,
Maybe the input point cloud was too large, and it ran out of the GPU memory. What GPU are you using and how much memory does it have?
By increasing init_num_buckets
, you can avoid GPU memory fragmentation and reduce a bit amount of memory. Another workaround is to use a larger voxel resolution that would substantially decrease the memory consumption.
Honestly, I didn't test the code for such large point clouds, and it may have some memory-related bugs. It would be very helpful to improve the code if you could provide some example data.
The problem seem to happen because of this judgement miss "bucket.second >= 0". https://github.com/SMRT-AIST/fast_gicp/blob/master/src/fast_gicp/cuda/gaussian_voxelmap.cu#L128 When coord = (0,0,0), it can be occasionally mistake matched to thrust::pair((0,0,0),-1),so under the if, bucket.second = -1. In gpu memory, voxel_covs is occasionally alligned just bebind buckets (in my code, when the bucket.size() == 32768, it happened). When add voxel_covs, it incorrectly change the last 9 int value of buckets, make the value too big. So when coord is match to bucket[bucket.size() - 3], bucket.second is a big value, make the address of voxel_covs out of the limit, call the error cudaErrorIllegalAddress.
Ah, you are right. It can access invalid memory regions when coord == (0, 0, 0). I didn't notice this bug because I usually filter out very close points. I'll push a fix soon. Thanks a lot @fls1995 !
Has this issue been fixed?
I just pushed a fix, and it will be merged into the main branch soon. https://github.com/SMRT-AIST/fast_gicp/pull/97
I think why coord(0,0,0) would be matched to thrust::pair((0,0,0),-1) because we didn't return to end the loop after added voxel_means. After I add return under line 144, the problem disappeared. We can add (bucket.second < 0) to avoid unexpected circumstances.
thanks!
target:204761[pts] source:59051[pts] --- ndt_cuda (P2D) --- terminate called after throwing an instance of 'thrust::system::system_error' what(): reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped)
Is there any solution? Thanks
target:204761[pts] source:59051[pts] --- ndt_cuda (P2D) --- terminate called after throwing an instance of 'thrust::system::system_error' what(): reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped)
Is there any solution? Thanks
I also meet this problem, do you solve it now?
Hi,@koide3 I am following the latest master branch. It occasionally appear when utilizing fast_gicp::FastVGICPCuda. Any advice? thanks.
Full message: terminate called after throwing an instance of 'thrust::system::system_error' what(): reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
it tracks back to :
return transform_reduce(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, unary_op, init, binary_op);
full stack:: __GI_raise 0x00007f18959dafb7 __GI_abort 0x00007f18959dc921