yunchih / ORB-SLAM2-GPU2016-final

http://yunchih.github.io/ORB-SLAM2-GPU2016-final/
Other
325 stars 150 forks source link

CPU Bottleneck on TX2? #8

Open madebyollin opened 6 years ago

madebyollin commented 6 years ago

EDIT 0: I think this may actually be an issue with not having run jetson_clock.sh before testing. Will update with results.

EDIT 1: After more testing, it definitely seems like not running jetson_clock.sh first was the issue. I get 14-20FPS on monocular now, depending on feature counts... Stereo is still slow, but that's probably expected.


Using the project presentation website and this thread on the NVIDIA forums, I've gotten this project running live on a Jetson TX2, with a lightly modified version of the original ROS Stereo node.

However, the performance of the GPU version in my install is generally slightly worse than the CPU version.

I'm curious if there are any suggestions for things to check or investigate while diagnosing this issue. Thanks!


More Explanation:

On a sample BAG file of 672x376 (ZED WVGA) stereo images broadcast at 30FPS, I can get around 5-6FPS with the original CPU-bound ORB_SLAM2, vs. around 4-5FPS for the GPU version. As a sanity check, the tx1 executable in /gpu gives ~8-9FPS monocular.

Here's a sample graph of what sudo ~/tegrastats gives for processor % utilization during the GPU version on the BAG file images: gpu_full And the CPU and mono versions, for reference: cpu_full

mono_full

The GPU is being utilized in the GPU version, but it seems like it's spending most of the time waiting for data, which suggests that the CPU operations are bottlenecked somehow.


Things that shouldn't be the issue:

yunchih commented 6 years ago

Hi, @madebyollin Thanks for the feedback! I don't have further comments regarding performance, but I do appreciate your writeup.