mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
https://mcunet.mit.edu
MIT License
792 stars 130 forks source link

No implemantation of convolve_s8_kernel3_stride1_pad1_fpreq() #82

Open wslong36 opened 1 year ago

wslong36 commented 1 year ago

Hey, @meenchen

I found that there is no kernel of convolve_s8_kernel3_stride1_pad1_fpreq(),

Would you mind upload this kernel of tiny engine?

Or, How should I use int_forward_op/convolve_s8_kernel3_stride1_pad1.c to implement convolve_s8_kernel3_stride1_pad1_fpreq()?

meenchen commented 1 year ago

Hi. @wslong36,

The only difference between fpreq and other ops is how outputs are re-quantized. fpreq means re-quantizing outputs with a fp32 scale (e.g., https://github.com/mit-han-lab/tinyengine/blob/fdc001922898df80fd07901f15a07a9e304ed705/TinyEngine/src/kernels/fp_requantize_op/mat_mul_kernels_fpreq.c#L294), while other ops follow the TFLite approach.

You can also use the fp_requantize flag in code generator to disable it (e.g., https://github.com/mit-han-lab/tinyengine/blob/fdc001922898df80fd07901f15a07a9e304ed705/examples/detection_fpn.py#L63).