Different inference result on my own model using TinyEngine compare to python

mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory

https://mcunet.mit.edu

MIT License

810 stars 131 forks source link

Different inference result on my own model using TinyEngine compare to python #59

Closed EricWu-ZL closed 1 year ago

EricWu-ZL commented 1 year ago

Hi, @meenchen. Thanks for your great jobs. As title, when I implemented my own task in the STM32cubeIDE and checked the network inference results, I found that the inference result would appear some biases compare to result of inferring TFlite model using python, especially happens when the deeper the network layer. I would like to ask if these biases are caused by some slight differences between the op in TinyEngine and the op in Tflite? Or have you ever encountered this problem? I would appreciate if you could provide some help. The device I am using is STM32F746G-DISCO, and my tensorflow version is 2.11.0.

meenchen commented 1 year ago

Hi @EricWu-ZL,

By default, TinyEninge uses floating-pint scalers to re-quantize accumulators, which differs slightly from TFLite implementation. But you can also disable this by switching off fp_requantize during code generation. This should close the gap.

Another difference could be due to the precision limitation on MCU. Sometimes the results of arithmetic operations could exceed the range of 32-bit integer (in very rare cases), which will incur slight errors as well.

EricWu-ZL commented 1 year ago

Hi, @meenchen. Thanks for your reply! Sorry I forgot to mention that the model I'm using has been converted to the int8 TFlite model. Will this also encounter the situation in your comment? In addition, when I inference model on the MCU repeatly, I also find that my inference results will fluctuate each time, I think the possible reason is that the memory regions I use may overlap, but I can't check where there is a problem. Besides this assumption, have you encountered any other reasons that might cause this issue?

meenchen commented 1 year ago

Hi, @meenchen. Thanks for your reply! Sorry I forgot to mention that the model I'm using has been converted to the int8 TFlite model. Will this also encounter the situation in your comment?

yes, the re-quantize operation is for int8 inference.

In addition, when I inference model on the MCU repeatly, I also find that my inference results will fluctuate each time, I think the possible reason is that the memory regions I use may overlap, but I can't check where there is a problem. Besides this assumption, have you encountered any other reasons that might cause this issue?

Do you mean the inference results are different each time for the same input?

leaf82318 commented 1 year ago

hi meenchen, I also found the difference of inference result between the tflite and mcu. I'll try to switch off fp_requantize during[code generation. Besides, What's the function of the option of "tflite_op=False"?

code_generator = CodeGenerator( memsche=memory_scheduler, inplace=memory_scheduler.USE_INPLACE, unsigned_input=False, patch_params=None, FP_output=False, profile_mode=False, fp_requantize=True, tflite_op=False, dummy_address=False, outputTables=outTable,

meenchen commented 1 year ago

hi meenchen, I also found the difference of inference result between the tflite and mcu. I'll try to switch off fp_requantize during[code generation. Besides, What's the function of the option of "tflite_op=False"?

code_generator = CodeGenerator( memsche=memory_scheduler, inplace=memory_scheduler.USE_INPLACE, unsigned_input=False, patch_params=None, FP_output=False, profile_mode=False, fp_requantize=True, tflite_op=False, dummy_address=False, outputTables=outTable,

Hi @leaf82318, that option is only for internal testing purposes. We will remove that option in the future refactoring.

EricWu-ZL commented 1 year ago

Hi, @meenchen. Thanks for your reply! Sorry I forgot to mention that the model I'm using has been converted to the int8 TFlite model. Will this also encounter the situation in your comment?

yes, the re-quantize operation is for int8 inference.

In addition, when I inference model on the MCU repeatly, I also find that my inference results will fluctuate each time, I think the possible reason is that the memory regions I use may overlap, but I can't check where there is a problem. Besides this assumption, have you encountered any other reasons that might cause this issue?

Do you mean the inference results are different each time for the same input?

Hi, @meenchen. Thank you again for your reply! About the problem that the inference result is not the same in each time, I have found the problem, thank you for your help. But about the gap with Tflite, I have turned off fp_requantize during code generation, but there is a small bias according to it. I wonder if there are many residuals in the network that will lead to greater bias the deeper the network is? Have you encountered a similar problem?

meenchen commented 1 year ago

Hi, @meenchen. Thanks for your reply! Sorry I forgot to mention that the model I'm using has been converted to the int8 TFlite model. Will this also encounter the situation in your comment?

yes, the re-quantize operation is for int8 inference.

In addition, when I inference model on the MCU repeatly, I also find that my inference results will fluctuate each time, I think the possible reason is that the memory regions I use may overlap, but I can't check where there is a problem. Besides this assumption, have you encountered any other reasons that might cause this issue?

Do you mean the inference results are different each time for the same input?

Hi, @meenchen. Thank you again for your reply! About the problem that the inference result is not the same in each time, I have found the problem, thank you for your help. But about the gap with Tflite, I have turned off fp_requantize during code generation, but there is a small bias according to it. I wonder if there are many residuals in the network that will lead to greater bias the deeper the network is? Have you encountered a similar problem?

Hi @EricWu-ZL, did the gap get closer after you turned off fp_requantize? Another possible reason could be some overflow which should be a rare case for real images/inputs. Could you share how large the bias is?

EricWu-ZL commented 1 year ago

Hi, @meenchen. Thank you for your reply. I've already turned off fp_requantize. I compare the output of the fifth layer of my model with TFlite, the output size is 28x28x32, and there are 8027 values are different compared with TFlite, the total bias is 8759.The max bias for a value is 9. There is no residual structure in the first five layers, only 1x1, 3x3 conv and 3x3 group conv

meenchen commented 1 year ago

Hi, @meenchen. Thank you for your reply. I've already turned off fp_requantize. I compare the output of the fifth layer of my model with TFlite, the output size is 28x28x32, and there are 8027 values are different compared with TFlite, the total bias is 8759.The max bias for a value is 9. There is no residual structure in the first five layers, only 1x1, 3x3 conv and 3x3 group conv

Hi @EricWu-ZL, we do not support 3x3 group conv. That could be the issue.

meenchen commented 1 year ago

Close due to inactivity. Feel free to reopen.