Open bfs18 opened 1 year ago
Probably you havent enable CMSIS's NN acceleration, i.e. you are running CMSIS on their C backend instead of SIMD assemblys. You may check CMSIS instruction or the note under HWC format
UPDATE: When -Ofast is added, using CMSIS 5.9.0 backend, reduces to 369 ms.
Hi Jia, thanks for your advice. I add
ARM_MATH_DSP,ARM_MATH_CM7,__FPU_PRESENT=1
using CMSIS 5.9.0 backend, reduces to 1055 ms. I used the rt-thread qemu-vexpress-a9 bsp. Is this normal?
Probably you havent enable CMSIS's NN acceleration, i.e. you are running CMSIS on their C backend instead of SIMD assemblys. You may check CMSIS instruction or the note under HWC format
I test the mnist-simple model in rt-thread environment and run model predict for 1000 times. using CMSIS 5.9.0 backend, it consumes 1423 ms using NNOM 0.4.3 C backend, it consumes 1392 ms The document says that replacing c backend with CMSIS would bring 5 times speed-up, however, the performance is similar according to my test. What's the reason? NNOM updated or I didn't set CMSIS macros correctly?