Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
Dear TFLM Team/ Community
I would like to ask if you have experemented with a way to handle sparse networks for generating the kernels for a specific Target. Methods like unstructured pruning brings a decent speedup when it comes to infereing on consumer CPUs or neural engines like deepsparse that is optimized to handle the zeros thus skip the operation entirely. Unfortunately for structured pruning where I filter entire kernels that leads automatically to a smaller model, the performance gets affected badly especially that for the deployment on mcpu, the networks used are not deep. Quantization as a method of compressing the network harmed the performance drastically.
Thank you in advance
Rayen
Dear TFLM Team/ Community I would like to ask if you have experemented with a way to handle sparse networks for generating the kernels for a specific Target. Methods like unstructured pruning brings a decent speedup when it comes to infereing on consumer CPUs or neural engines like deepsparse that is optimized to handle the zeros thus skip the operation entirely. Unfortunately for structured pruning where I filter entire kernels that leads automatically to a smaller model, the performance gets affected badly especially that for the deployment on mcpu, the networks used are not deep. Quantization as a method of compressing the network harmed the performance drastically. Thank you in advance Rayen