Closed petewarden closed 10 months ago
I also just added dual-core optimizations to depthwise convolutions. Together with the existing conv2d changes, this reduces the time for the person detection benchmark from 824ms to 588ms, a 1.4x speed increase.
Confirmed via email that this is the last of the changes for this PR. Merging!
The existing TensorFlow Lite Micro repository in the Raspberry Pi Github organization is based on the initial porting work I did three years ago for the release of the RP2040. We didn't have an easy way to update this RPi fork of the project to reflect changes made on the main repository owned by Google at https://github.com/tensorflow/tflite-micro.
Now that I've left Google, I'm taking on this project as a "best effort" maintainer. To make it easier, I've created a series of scripts in the
sync
folder that should make it possible to automatically pull the latest changes from the Google repository and convert them into the form needed for this Pico repository.I've also made some updates that speed up the default CMSIS-NN implementation for Conv2D by splitting it across both cores on the RP2040. This optimization did expose underlying stack overflow bugs in some of the test code so you can disable it by commenting out
TF_LITE_PICO_MULTICORE
in src/third_party/cmsis_nn/Source/NNSupportFunctions/arm_nn_mat_mult_nt_t_s8.c#L43 if you experience problems in your code.