Closed Nik057 closed 2 months ago
Using instead a model with opset <=16 will have lower performance on NPU than running ONNX on CPU. Is there a way to convert this model to RKNN and have optimal performance?
It is actually very common to get a "RKNN NPU model" slower than CPU ¯\_(ツ)_/¯ because the limitation of NPU and their driver. opset version is not the problem.
Hello, thanks for your issue report.
Recently, RKNN-toolkit2(2.0.0beta) is released and I test depth_anything_vits with 1x3x420x644 input. The inference time is almost 1s for one frame at int8. I thought in the next version it could be promoted to 0.5s at int8.
May I ask for your expected performance and, the CPU performance?
Hello @zen-xingle, thanks for the answer, I was hoping RKNN would be faster than ONNX, and it's cool to see that the latest version has doubled the inference speed compared to before. That's a big win!
Right now, CPU performance seems pretty much the same, maybe a tad faster, around 600ms for a 320x420 resolution. But RKNN was taking about 1700ms with version 1.6.0, and now with the 2.0 release, it's down to around 750ms. It's still not quite as fast as the CPU, but it's a massive improvement.
Excited to see what comes next! Thanks a bunch for the update.
Hi!
I tried a conversion of this model starting from torch to ONNX to RKNN to use that on OrangePi 5 NPUs (RK3588s).
I noticed that going from ONNX to RKNN is possible only using opset <= 16, latest versions are using LayerNormalization layers which are not supported:
E RKNN: [00:22:08.456] Op type:LayerNormalization, name: LayerNormalization:/blocks.0/norm1/LayerNormalization, fallback cpu failed. please try updating to the latest version of the toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn) E RKNN: [00:22:08.545] Unsupport LayerNormalization! Please lower the OPSET version of the onnx model to below 16.
Using instead a model with opset <=16 will have lower performance on NPU than running ONNX on CPU. Is there a way to convert this model to RKNN and have optimal performance?
Conversion code used:
PTH TO ONNX:
ONNX TO RKNN:
Thank you in advance for any advice