puzzlepaint / surfelmeshing

Real-time surfel-based mesh reconstruction from RGB-D video.
BSD 3-Clause "New" or "Revised" License
420 stars 83 forks source link

CUDA error: too many resources requested for launch OR Cuda Error: invalid argument #3

Closed RuibinMa closed 5 years ago

RuibinMa commented 5 years ago

Thanks for sharing the code! I compiled your software using CUDA 9.0 and Eigen 3.3.7 and encountered the following error. It seems to be complaining about GPU memory. However, as printed out, only 2608.94 MB are used of total 8GB.

./build/applications/surfel_meshing/SurfelMeshing ~/Desktop/rgbd_dataset_freiburg1_desk groundtruth.txt --follow_input_camera false --max_surfel_count 20000000

I0115 21:56:34.386611 24843 main.cc:614] Read dataset with 573 frames I0115 21:56:34.662744 24843 main.cc:850] GPU memory usage after initialization: used = 2608.94 MiB, free = 5507.62 MiB, total = 8116.56 MiB F0115 21:56:34.758725 24843 cuda_surfel_reconstruction.cu:1277] Cuda Error: too many resources requested for launch Check failure stack trace: @ 0x7f912f9240cd google::LogMessage::Fail() @ 0x7f912f925f33 google::LogMessage::SendToLog() @ 0x7f912f923c28 google::LogMessage::Flush() @ 0x7f912f926999 google::LogMessageFatal::~LogMessageFatal() @ 0x5563e76da682 _ZN3vis25IntegrateMeasurementsCUDAEP11CUstream_stjifffRKN6Sophus3SE3IfLi0EEEfRKNS_10CameraImplILi1EfJNS_17PinholeProjectionIfEENS_13PixelMapping4IfEEEEERKNS_10CUDABufferItEERKNSF_I6float2EERKNSF_IfEERKNSF_IN5Eigen6MatrixIhLi3ELi1ELi0ELi3ELi1EEEEERKNSF_IjEESY_SY_SPjPSN @ 0x5563e769bbb3 vis::CUDASurfelReconstruction::Integrate() @ 0x5563e76929ac main @ 0x7f912d574b97 __libc_start_main @ 0x5563e7697d2a _start Aborted (core dumped)

I tried to decrease the "--max_surfel_count". After I decrease it to 20000 (which is too small I believe), I encountered the following error:

./build/applications/surfel_meshing/SurfelMeshing ~/Desktop/rgbd_dataset_freiburg1_desk groundtruth.txt --follow_input_camera false --max_surfel_count 20000 I0115 22:00:51.342407 24922 main.cc:614] Read dataset with 573 frames I0115 22:00:51.634147 24922 main.cc:850] GPU memory usage after initialization: used = 400.125 MiB, free = 7716.44 MiB, total = 8116.56 MiB F0115 22:00:51.700417 24922 cuda_buffer_inl.h:159] Cuda Error: invalid argument Check failure stack trace: @ 0x7f7a085ab0cd google::LogMessage::Fail() @ 0x7f7a085acf33 google::LogMessage::SendToLog() @ 0x7f7a085aac28 google::LogMessage::Flush() @ 0x7f7a085ad999 google::LogMessageFatal::~LogMessageFatal() @ 0x559a114b6567 vis::CUDABuffer<>::DownloadPartAsync() @ 0x559a114b4396 vis::CUDASurfelReconstruction::TransferAllToCPU() @ 0x559a114acac9 main @ 0x7f7a061fbb97 __libc_start_main @ 0x559a114b1d2a _start Aborted (core dumped)

Any idea of why there's such a problem? Thanks in advance!

puzzlepaint commented 5 years ago

Which graphics card do you use? I tested the program on a GTX 1080 and a GTX 1070.

I believe that the error is not related to the GPU memory use, since as you point out, it only takes up a part of the available 8 GB.

I would do the following to try to debug this:

RuibinMa commented 5 years ago

Thanks for the quick response. To make a long story short, it turns out to be a block_dim problem.

After I uncommented those lines and recompiled the software, I still couldn't see which kernel is crashing. The error was still the same. It seems like the cudaDeviceSynchronize() function passed but the CHECK_CUDA_NO_ERROR() function failed. It's mysterious. Then I simply set all 1024 in cuda_surfel_reconstruction.cu to 512. Now the program runs.

That being said, would you consider lowering these values for safety?

puzzlepaint commented 5 years ago

Nice to hear that it works now. The behavior you describe is expected: CUDA kernels run asynchronously in the background. cudaDeviceSynchronize() waits for all kernels to finish running. Then the next call to the CHECK_CUDA_NO_ERROR() helper macro determines whether the last run kernel has reported any issues. So it was indeed (at least) the IntegrateMeasurementsCUDAKernel() in your case that caused the issue.

Yes, it is likely a good idea to lower the default block sizes then. But I guess it might still happen that it has to be lowered further for other GPUs. In addition, this is also a performance question, since the best-performing block size may differ from one GPU to the other. Thus, the best approach might be to have a small benchmark that tries out all reasonable values and checks which settings work and which is the fastest. In any case, I will at least mention this problem in the README, since this is likely common.