uzh-rpg / vilib

CUDA Visual Library by RPG
Other
428 stars 89 forks source link

Cuda Problem happens within test_vilib #4

Closed DongDongXA closed 4 years ago

DongDongXA commented 4 years ago

I compiled and ran this project on Jetson AGX Xavier developer kit, while I ran the test called test_vilib, Image Pyramid, SubframePool, PyramidPool all show success, but FAST detector showed no result, and the test program paused there, so I decided to find where it pause. Then I found that FAST_CPU detector is completely ok, and the test pause within one member function called copyGridToHost belong to FAST_GPU's parent class DetectorBaseGPU. Finally I find that the test halts just after it successfully runs, CUDA_API_CALL(cudaMemcpyAsync(h_featuregrid, d_featuregrid, feature_gridbytes, cudaMemcpyDeviceToHost, stream_)) , the test halts while execute CUDA_APICALL(cudaStreamSynchronize(stream)) Since I am not very familiar with cuda api, I haven't tried to delete this piece of synchronize code, I just need to know how to fix this bug in this project. I have read this paper, ur work's result is so exciting, and I really expect to see this result in my computer!!!! Thanks in advance :)

baliika commented 4 years ago

Hi, thanks for the feedback.

  1. Just as a first verification, please run ls -1 test/images/euroc/images/752_480/ | wc -l from the visual_lib folder. You should get a number N - the number of images you extracted from the Euroc machine hall dataset. (3682 >)
  2. Could you also provide the output of gcc --version and nvcc --version
  3. Could you recompile the library with a modification to the Makefile and tell us what you observe (apart from it being slower): https://github.com/uzh-rpg/rpg_cuda_thesis/blob/master/visual_lib/Makefile#L6 - change RELEASE_MODE=0, then make clean and make test -j4

Thanks.

DongDongXA commented 4 years ago

Hi, thank u for replying so quickly.

  1. Considering Xavier's flash storage is so limited, I use V1_02_medium.bag instead, it is the smallest data dataset within euroc datasets, I checked ur file io interface and I think it doesn't affect the test programme's performance. After I run the ls command, it shows (1710).

  2. gcc:gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

    nvcc:nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep_30_21:09:22_CDT_2018 Cuda compilation tools, release 10.0, V10.0.166

    uname -a: Linux nvidia-desktop 4.9.140-tegra #1 SMP PREEMPT Wed Mar 13 00:30:11 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux

  3. I fogot to tell you whether RELEASE_MODE=0 or 1, it shows exactly the same results, I also checked the CPU detector's result and it is ok.

baliika commented 4 years ago

Unfortunately, I couldn't reproduce the issue on our boards, but, I had an idea that might solve your issue on the Xavier. Would you mind trying the following patch file, please?

Just unzip the zip file somewhere in the repository, then perform git apply issue_04.patch, then make test.

issue_04.zip

Thank you in advance!

DongDongXA commented 4 years ago

It is really weird,ur codes runs well on jetson tx2 with same version gcc and cuda even without this patch, I need to verify this patch on xavier later in ur morning.

DongDongXA commented 4 years ago

After I apply this patch, it is still not working on xavier and suspend within the same CUDA_APICALL(cudaStreamSynchronize(stream)) in copyGridToHost. I can promise that the cuda environment is ok, since I have deployed tensorrt on this xavier machine.

baliika commented 4 years ago

Sorry for a bit of delay from our side. We've prepared a fix branch that ought to fix issues with 7.x devices. We could reproduce the issue with a 7.5 CC. GPU today, but since our paper used 5.x and 6.x devices we didn't catch this one. We'll merge this fix branch (fix/volta_turing) to master, but we would be glad if you could confirm that the issue has been resolved also on your side. You may ignore now the previous patch.

DongDongXA commented 4 years ago

I have tested this fix branch on xavier, it goes well, thank u for ur efforts on it. And I have some questions about this projects' performances reflected in this test's results and try to find answers within ur paper and codes. I need to sum up all of these questions in a few days and would open another issue. Thank you very much.