EDIT 0: I think this may actually be an issue with not having run jetson_clock.sh before testing. Will update with results.
EDIT 1: After more testing, it definitely seems like not running jetson_clock.sh first was the issue. I get 14-20FPS on monocular now, depending on feature counts... Stereo is still slow, but that's probably expected.
Using the project presentation website and this thread on the NVIDIA forums, I've gotten this project running live on a Jetson TX2, with a lightly modified version of the original ROS Stereo node.
However, the performance of the GPU version in my install is generally slightly worse than the CPU version.
I'm curious if there are any suggestions for things to check or investigate while diagnosing this issue. Thanks!
More Explanation:
On a sample BAG file of 672x376 (ZED WVGA) stereo images broadcast at 30FPS, I can get around 5-6FPS with the original CPU-bound ORB_SLAM2, vs. around 4-5FPS for the GPU version. As a sanity check, the tx1 executable in /gpu gives ~8-9FPS monocular.
Here's a sample graph of what sudo ~/tegrastats gives for processor % utilization during the GPU version on the BAG file images:
And the CPU and mono versions, for reference:
The GPU is being utilized in the GPU version, but it seems like it's spending most of the time waiting for data, which suggests that the CPU operations are bottlenecked somehow.
Things that shouldn't be the issue:
The TX2 is running on Mode 0 (sudo nvpmodel -m 0).
I've compiled both versions of OpenCV (ROS version and standalone) with CUDA, cuBLAS and fast math flags enabled:
OpenCV 3.2 (ROS, used when building the ROS node):
EDIT 0: I think this may actually be an issue with not having run
jetson_clock.sh
before testing. Will update with results.EDIT 1: After more testing, it definitely seems like not running
jetson_clock.sh
first was the issue. I get 14-20FPS on monocular now, depending on feature counts... Stereo is still slow, but that's probably expected.Using the project presentation website and this thread on the NVIDIA forums, I've gotten this project running live on a Jetson TX2, with a lightly modified version of the original ROS Stereo node.
However, the performance of the GPU version in my install is generally slightly worse than the CPU version.
I'm curious if there are any suggestions for things to check or investigate while diagnosing this issue. Thanks!
More Explanation:
On a sample BAG file of 672x376 (ZED WVGA) stereo images broadcast at 30FPS, I can get around 5-6FPS with the original CPU-bound ORB_SLAM2, vs. around 4-5FPS for the GPU version. As a sanity check, the
tx1
executable in/gpu
gives ~8-9FPS monocular.Here's a sample graph of what
sudo ~/tegrastats
gives for processor % utilization during the GPU version on the BAG file images: And the CPU and mono versions, for reference:The GPU is being utilized in the GPU version, but it seems like it's spending most of the time waiting for data, which suggests that the CPU operations are bottlenecked somehow.
Things that shouldn't be the issue:
sudo nvpmodel -m 0
).