ncsurobotics / SW8S-Rust

Rust code for Seawolf 8
GNU General Public License v3.0
3 stars 1 forks source link

Add CUDA processing #150

Closed Bennett-Petzold closed 2 months ago

Bennett-Petzold commented 2 months ago

Previously model processing was done entirely on CPU, as per OpenCV defaults. This adds the feature flag cuda, which uses a CUDA kernel port of our post processing code. The model produces a (relatively) large output to process into a small set of successes, and each potential success processes in complete isolation. Therefore the kernel demonstrates significant speedup (over 2x post processing speed), especially on systems with slow CPUs. CPU post-processing of model output is also slightly improved.

OpenCV uses CUDA calls not supported on the Tegra architecture, so while using an OpenCV backend did speed up code on x86 devices it causes crashes on the Jetson. OpenCL should be explored as an alternative option, since using the GPU produced significant speedups.

Benchmarking with criterion was added to prove the speedups from using CUDA.

Full processing of an image through a model now takes about 600 ms on the Jetson Nano.

Bennett-Petzold commented 2 months ago

Ignore the failing build, it's because the github action has not been updated to skip the CUDA flag yet. The free runners can't compile and run CUDA code.