Add CUDA processing - Githubissues

Previously model processing was done entirely on CPU, as per OpenCV defaults. This adds the feature flag cuda, which uses a CUDA kernel port of our post processing code. The model produces a (relatively) large output to process into a small set of successes, and each potential success processes in complete isolation. Therefore the kernel demonstrates significant speedup (over 2x post processing speed), especially on systems with slow CPUs. CPU post-processing of model output is also slightly improved.

OpenCV uses CUDA calls not supported on the Tegra architecture, so while using an OpenCV backend did speed up code on x86 devices it causes crashes on the Jetson. OpenCL should be explored as an alternative option, since using the GPU produced significant speedups.

Benchmarking with criterion was added to prove the speedups from using CUDA.

Full processing of an image through a model now takes about 600 ms on the Jetson Nano.

ncsurobotics / SW8S-Rust

Add CUDA processing #150