koide3 / fast_gicp

A collection of GICP-based fast point cloud registration algorithms
BSD 3-Clause "New" or "Revised" License
1.23k stars 316 forks source link

reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered #37

Open cdb0y511 opened 3 years ago

cdb0y511 commented 3 years ago

Hi,@koide3 I am following the latest master branch. It occasionally appear when utilizing fast_gicp::FastVGICPCuda. Any advice? thanks.

Full message: terminate called after throwing an instance of 'thrust::system::system_error' what(): reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

it tracks back to :

return transform_reduce(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, unary_op, init, binary_op);

full stack:: __GI_raise 0x00007f18959dafb7 __GI_abort 0x00007f18959dc921

0x00007f18963cf957 0x00007f18963d5ae6 std::terminate() 0x00007f18963d5b21 __cxa_throw 0x00007f18963d5d54 thrust::cuda_cub::throw_on_error util.h:213 thrust::cuda_cub::detail::reduce_n_impl, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, thrust::zip_iterator const> >, thrust::detail::normal_iterator const> >, thrust::detail::normal_iterator >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, fast_gicp::cuda::(anonymous namespace)::compute_derivatives_kernel, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> > >, long, thrust::tuple, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, fast_gicp::cuda::(anonymous namespace)::sum_errors_kernel> reduce.h:982 thrust::cuda_cub::reduce_n, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, thrust::zip_iterator const> >, thrust::detail::normal_iterator const> >, thrust::detail::normal_iterator >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, fast_gicp::cuda::(anonymous namespace)::compute_derivatives_kernel, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> > >, long, thrust::tuple, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, fast_gicp::cuda::(anonymous namespace)::sum_errors_kernel> reduce.h:1017 thrust::cuda_cub::transform_reduce const> >, thrust::detail::normal_iterator const> >, thrust::detail::normal_iterator >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, fast_gicp::cuda::(anonymous namespace)::compute_derivatives_kernel, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, thrust::tuple, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, fast_gicp::cuda::(anonymous namespace)::sum_errors_kernel> transform_reduce.h:58 thrust::transform_reduce const> >, thrust::detail::normal_iterator const> >, thrust::detail::normal_iterator >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, fast_gicp::cuda::(anonymous namespace)::compute_derivatives_kernel, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, thrust::tuple, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, fast_gicp::cuda::(anonymous namespace)::sum_errors_kernel> transform_reduce.inl:47 thrust::transform_reduce const> >, thrust::detail::normal_iterator const> >, thrust::detail::normal_iterator >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, fast_gicp::cuda::(anonymous namespace)::compute_derivatives_kernel, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, thrust::tuple, Eigen::Matrix, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, fast_gicp::cuda::(anonymous namespace)::sum_errors_kernel> transform_reduce.inl:67 fast_gicp::cuda::compute_derivatives compute_derivatives.cu:156 fast_gicp::cuda::FastVGICPCudaCore::compute_error fast_vgicp_cuda.cu:219
cdb0y511 commented 3 years ago

I doubt it may relate to the voxel resolution cause after I increase resolution, it seems to disappear.

koide3 commented 3 years ago

Thanks for reporting the issue. I'll check if voxelization has some memory problems soon.

cdb0y511 commented 3 years ago

Thanks for reporting the issue. I'll check if voxelization has some memory problems soon.

I strongly suggest you could add more NeighborSearchMethod for the Cuda version. like, setNeighborSearchMethod(fast_gicp::NeighborSearchMethod::DIRECT7); I find DIRECT7 is more robust than DIRECT1 in some scenarios. I think the resolution parameter can be easier to choose when the neighbor voxel is computed. Because the smaller resolution results in fewer points in each voxel, and to consider the neighbor voxels is fair compensation. I am not familiar with thrust. But I think the fast_vgicp_voxel.hpp is a good example. I hope you can make your Cuda version more like it. Thanks,

koide3 commented 3 years ago

Yep, it must be good to support more NeighborSearchMethod options in the CUDA version. I'm quite busy with writing some papers, but I try to find spare time to update the VGICP_CUDA code in a few weeks.

koide3 commented 3 years ago

I updated VGICP_CUDA so it supports DIRECT7 and DIRECT27. I also added a new neighbor voxel search method DIRECT_RADIUS which uses voxels within a radius so the GPU version can take more voxels into account for registration stability. Please take a look at "fix_vgicp" branch.

Regarding the illegal memory access error, I couldn't reproduce the error on my PC. Can you provide some data so I can check the problem?

cdb0y511 commented 3 years ago

Great work! I will try it. The illegal memory access error seems related to the resolution, and I will send some data along with the resolution parameter if the fix_vgicp occurs this error. Thanks,

kisung-kim commented 3 years ago

Above all, thank you for sharing this code.

I also "occasionally" got

GaussianVoxelMap::create_voxelmap 4
terminate called after throwing an instance of 'thrust::system::system_error'
what():  parallel_for failed: cudaErrorIllegalAddress: an illegal memory access was encountered                                                                                                                    
Aborted (core dumped) 

error at https://github.com/SMRT-AIST/fast_gicp/blob/master/src/fast_gicp/cuda/gaussian_voxelmap.cu#L242

I added some printf lines as below.

printf("GaussianVoxelMap::create_voxelmap 4\n");  
  thrust::for_each(
    thrust::cuda::par.on(stream),
    thrust::make_zip_iterator(thrust::make_tuple(points.begin(), covariances.begin())),
    thrust::make_zip_iterator(thrust::make_tuple(points.end(), covariances.end())),
    accumulate_points_kernel(voxelmap_info_ptr.data(), buckets, num_points, voxel_means, voxel_covs));
  printf("GaussianVoxelMap::create_voxelmap 5\n");

I can find line those error occured by printf('something'). So I'm not sure what is the reason of this issue.

my registration settings

number of points

method : "VGICP_CUDA" downsample_resolution : 0.05 k_correspondences : 30 voxel_resolution : 0.8 max_correspondence_distance : 0.4 neighbor_search_method : "DIRECT_RADIUS" neighbor_search_radius : 0.3

Could you help me for this problem? Thank you.

kisung-kim commented 3 years ago

If target point cloud size is very big,

(int init_num_buckets=8192) in GaussianVoxelMap constructor should be much higher?

koide3 commented 3 years ago

Hi @kisung-kim ,

Maybe the input point cloud was too large, and it ran out of the GPU memory. What GPU are you using and how much memory does it have?

By increasing init_num_buckets, you can avoid GPU memory fragmentation and reduce a bit amount of memory. Another workaround is to use a larger voxel resolution that would substantially decrease the memory consumption.

Honestly, I didn't test the code for such large point clouds, and it may have some memory-related bugs. It would be very helpful to improve the code if you could provide some example data.

fls1995 commented 2 years ago

The problem seem to happen because of this judgement miss "bucket.second >= 0". https://github.com/SMRT-AIST/fast_gicp/blob/master/src/fast_gicp/cuda/gaussian_voxelmap.cu#L128 When coord = (0,0,0), it can be occasionally mistake matched to thrust::pair((0,0,0),-1),so under the if, bucket.second = -1. In gpu memory, voxel_covs is occasionally alligned just bebind buckets (in my code, when the bucket.size() == 32768, it happened). When add voxel_covs, it incorrectly change the last 9 int value of buckets, make the value too big. So when coord is match to bucket[bucket.size() - 3], bucket.second is a big value, make the address of voxel_covs out of the limit, call the error cudaErrorIllegalAddress.

koide3 commented 2 years ago

Ah, you are right. It can access invalid memory regions when coord == (0, 0, 0). I didn't notice this bug because I usually filter out very close points. I'll push a fix soon. Thanks a lot @fls1995 !

whuzs commented 2 years ago

Has this issue been fixed?

koide3 commented 2 years ago

I just pushed a fix, and it will be merged into the main branch soon. https://github.com/SMRT-AIST/fast_gicp/pull/97

fls1995 commented 2 years ago

I think why coord(0,0,0) would be matched to thrust::pair((0,0,0),-1) because we didn't return to end the loop after added voxel_means. After I add return under line 144, the problem disappeared. We can add (bucket.second < 0) to avoid unexpected circumstances.

whuzs commented 2 years ago

thanks!

ZFcvYes commented 8 months ago

target:204761[pts] source:59051[pts] --- ndt_cuda (P2D) --- terminate called after throwing an instance of 'thrust::system::system_error' what(): reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped)

Is there any solution? Thanks

NirvanaDragon commented 6 months ago

target:204761[pts] source:59051[pts] --- ndt_cuda (P2D) --- terminate called after throwing an instance of 'thrust::system::system_error' what(): reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped)

Is there any solution? Thanks

I also meet this problem, do you solve it now?