[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
I wanted to reach out because I've been facing some difficulties while attempting to perform single-person estimation on the OPENMV4P. I have been following your person_detection_demo, which has been quite helpful.
I train a tiny movenet and only left its heatmap head in the COCO2017 dataset. It works fine enough for me in the tflite format. But when I implement it through tinyengine framework, the results are bad and different from the tflite's.
I debug with the same RGB picture(-128~127), turns out that the output is very different.
The generated code genModel.h shows that:
#define NNoutput &buffer0[98304];/* sram:189584, flash:29544 */static signed char buffer[189584];static signed char *buffer0 = &buffer[0];static signed char *buffer1 = &buffer[180224];static int16_t *sbuf = (int16_t *)&buffer[180224];static int32_t *kbuf = (int32_t *)&buffer[188936];const int SBuffer_size = 8712;const int KBuffer_size = 648;
Shouldn't it be so big as to cause memory overlap?
Since I have only one output to manage, I don't register detectionUtils like things in the codegenerator. So the NNoutput is captured by _findtheinferenceOutput. I think it is ok...
And I also try to set fp_requantize to False, and it did not help. At this point, I'm unsure of what could be causing the discrepancies, and I would greatly appreciate any guidance or suggestions that could help me move forward with this.
Hi, Thank you for your great work!
I wanted to reach out because I've been facing some difficulties while attempting to perform single-person estimation on the OPENMV4P. I have been following your
person_detection_demo
, which has been quite helpful.I train a tiny movenet and only left its heatmap head in the COCO2017 dataset. It works fine enough for me in the tflite format. But when I implement it through tinyengine framework, the results are bad and different from the tflite's.
I debug with the same RGB picture(-128~127), turns out that the output is very different.
The generated code
genModel.h
shows that:#define NNoutput &buffer0[98304];
/* sram:189584, flash:29544 */
static signed char buffer[189584];
static signed char *buffer0 = &buffer[0];
static signed char *buffer1 = &buffer[180224];
static int16_t *sbuf = (int16_t *)&buffer[180224];
static int32_t *kbuf = (int32_t *)&buffer[188936];
const int SBuffer_size = 8712;
const int KBuffer_size = 648;
Shouldn't it be so big as to cause memory overlap? Since I have only one output to manage, I don't register
detectionUtils
like things in thecodegenerator
. So the NNoutput is captured by _findtheinferenceOutput. I think it is ok...And I also try to set
fp_requantize
to False, and it did not help. At this point, I'm unsure of what could be causing the discrepancies, and I would greatly appreciate any guidance or suggestions that could help me move forward with this.pose_tflite_model.zip