go-rknnlite provides Go language bindings for the RKNN Toolkit2 C API interface. It aims to provide lite bindings in the spirit of the closed source Python lite bindings used for running AI Inference models on the Rockchip NPU via the RKNN software stack.
These bindings have only been tested on the RK3588 (specifically the Radxa Rock Pi 5B) but should work on other RK3588 based SBC's. It should also work with other models in the RK35xx series supported by the RKNN Toolkit2.
To use in your Go project, get the library.
go get github.com/swdee/go-rknnlite
Or to try the examples clone the git code and data repositories.
git clone https://github.com/swdee/go-rknnlite.git
cd go-rknnlite/example
git clone https://github.com/swdee/go-rknnlite-data.git data
Then refer to the Readme files for each example to run on command line.
The rknn-toolkit2 must be installed on
your system with C header files available in the system path, eg: /usr/include/rknn_api.h
.
Refer to the official documentation on how to install this on your system as it will vary based on OS and SBC vendor.
My usage was on the Radxa Rock Pi 5B running the official Debian 11 OS image which has the rknpu2 driver already installed.
To my knowledge Armbian and Joshua's Ubuntu OS images also have the driver installed for the support SBC's.
You can test if your OS has the driver installed with.
dmesg | grep -i rknpu
The output should list the driver and indicate the NPU is initialized.
[ 5.130935] [drm] Initialized rknpu 0.8.2 20220829 for fdab0000.npu on minor 1
The examples make use of GoCV for image processing. Make sure you have a working installation of GoCV first, see the instructions in the link for installation on your system.
See the example directory.
Running multiple Runtimes in a Pool allows you to take advantage of all three NPU cores. For our usage of an EfficentNet-Lite0 model, a single runtime has an inference speed of 7.9ms per image, however running a Pool of 9 runtimes brings the average inference speed down to 1.65ms per image.
See the Pool example.
For other Rockchip models such as the RK3566 which features a single NPU core, initialise
the Runtime with the rknnlite.NPUSkipSetCore
flag as follows.
rt, err := rknnlite.NewRuntime(*modelFile, rknnlite.NPUSkipSetCore)
The performance of the NPU is effected by which CPU cores your program runs on, so to achieve maximum performance we need to set the CPU Affinity.
The RK3588 for example has 4 fast Cortex-A76 cores at 2.4Ghz and 4 efficient Cortex-A55 cores at 1.8Ghz. By default your Go program will run across all cores which effects performance, instead set the CPU Affinity to run on the fast Cortex-A76 cores only.
// set CPU affinity
err = rknnlite.SetCPUAffinity(rknnlite.RK3588FastCores)
if err != nil {
log.Printf("Failed to set CPU Affinity: %v\n", err)
}
Constants have been set for RK3588 and RK3582 processors, for other CPU's you can define the core mask.
To create the core mask value we will use the RK3588 as an example which has CPU cores 0-3 as the slow A55 cores and cores 4-7 being the fast A76 cores.
You can use the provided convenience function to calculate the mask for cores 4-7.
mask := rknnlite.CPUCoreMask([]int{4,5,6,7})
Convenience functions exist for handling preprocessing of images to run inference on.
The preprocess.Resizer
provides functions for handling resizing and scaling of input
images to the target size needed for inference input tensors. It will maintain
aspect ratio by scaling and applying any needed letterbox padding to the source image.
// load source image file
img := gocv.IMRead(filename, gocv.IMReadColor)
if img.Empty() {
log.Fatal("Error reading image from: ", *imgFile)
}
// convert colorspace from GoCV's BGR to RGB as most models have been trained
// using RGB data
rgbImg := gocv.NewMat()
gocv.CvtColor(img, &rgbImg, gocv.ColorBGRToRGB)
// create new resizer setting the source image size and input tensor sizes
resizer := preprocess.NewResizer(img.Cols(), img.Rows(),
int(inputAttrs[0].Dims[1]), int(inputAttrs[0].Dims[2]))
// resize image
resizedImg := gocv.NewMat()
resizer.LetterBoxResize(rgbImg, &resizedImg, render.Black)
For Object Detection and Instance Segmentation the Resizer is required so image mask sizes can be correctly calculated and scaled back for applying as an overlay on the source image.
The render
package provides convenience functions for drawing the bounding box
around objects or segmentation mask/outline.
If a Model (ie: specific YOLO version) is not yet supported, a post processor could be written to handle the outputs from the RKNN engine in the same manner the YOLOv5 code has been created.
This code is being used in production for Image Classification. Over time it will be expanded on to support more features such as Object Detection using YOLO. The addition of new features may cause changes or breakages in the API between commits due to the early nature of how this library evolves.
Ensure you use Go Modules so your code is not effected, but be aware any updates may require minor changes to your code to support the latest version.
Versioning of the library will be added at a later date once the feature set stablises.