mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
https://mcunet.mit.edu
MIT License
792 stars 130 forks source link

mcunet model with cmsis-nn #23

Closed brainchip-india closed 1 year ago

brainchip-india commented 1 year ago

Hi @meenchen @RaymondWang0 I couldn't find any tutorial or demo to run mcunet models with cmsis-nn, can you please point me to the page or please let me know how to generate mcunet model to deploy on cmsis-nn functions.. or can i use the same model with cmsis-nn similar functions?

RaymondWang0 commented 1 year ago

Hi @brainchip-india,

The tutorial or demo to run models with CMSIS-NN is beyond the scope of this repo since this repo is mainly for TinyEngine. Our suggestion is: First, obtain the model you would like to use (for the MCUNet models, please refer to the model zoo in MCUNet repo). Then, please refer to TF-Lite Micro's repo and CMSIS-NN's repo to learn how to deploy models by using TF-Lite Micro's runtime and CMSIS-NN's optimized kernels.

brainchip-india commented 1 year ago

Hi @RaymondWang0 @meenchen , there are few observations when I tested with tflite @713b6ed6 when I create a static tflite library for the M7 board the inference is failing at allocate_tensors() function in tflite execution, however when I tried creating tflite static library for M4 board and ran the lib on M7 the build/inference got passed (please refer to the attachment) and almost matched to the measured results

my question: is it the expected procedure? or am I missing something here to run the tflite model on M7, please help me thanks in advance.

image

Edited: one more question:, the timing results in measured results are only inference time? or camera capture + conversion to 8bits + inference?

meenchen commented 1 year ago

Hi @brainchip-india,

We are using the source code of tflite micro during the measurement instead of the static library, but the measurement results you got seem reasonable.

Regarding another question, yes, the timing results are only for inference.

brainchip-india commented 1 year ago

Hi @meenchen, I have fixed the tflite M7 static library build now I am able to run the inference.

One last question:

With the tflite build(default kernal and cmsis-nn enabled) the inference results are not consistent compared to mcunet original model, however when running the tflite_eval.py it showing accuracy of 88% but the end inference results are not accurate/not consistent with live images any reason? same with static images I tried with 10 static person images with tflite the inference results are very poor(and I tried subtracting 128 also as per tflite_eval.py file/ live image conversion still the inference is poor).

meenchen commented 1 year ago

Hi @brainchip-india,

Unfortunately, I don't have a thorough understanding of the tflite micro. Based on what you have described, it seems that there may be an issue with the tflite micro. I suggest that you file an issue report on their GitHub repository in order to bring this to the attention of the developers and potentially get some assistance with resolving the problem.

brainchip-india commented 1 year ago

Hi @meenchen, I have resolved the issue actually I am doing an input scaling to the inputs (model_input->params.scale + model_input->params.zero_point) which is not required as per your tflite model and just subtracted -128 to the input which got solved the problem now I am able to see the accuracy is good with both live and static images.

one question: I don't see any time capture code in the project repo, are you using systick or any other external timer to calculate the inference time??

brainchip-india commented 1 year ago

Hi @meenchen, can you please share the new board details (STM32H743) I see there are lot of varients. it will be helpful if you can share the link. Thanks.

meenchen commented 1 year ago

Hi @brainchip-india, we are using CoreH743: https://www.waveshare.com/coreh743i.htm.

brainchip-india commented 1 year ago

Hi @meenchen, thanks for the link and I see that your model zoo is not updated with the latest models assuming new build project is releasing soon with the new board and new models. Any particular date?

brainchip-india commented 1 year ago

Hi @meenchen, with the latest PR merge of import path, I am still seeing the model generate failure can you please look into it.

Error: Traceback (most recent call last): File "examples/vww.py", line 20, in from mcunet.mcunet.model_zoo import download_tflite ModuleNotFoundError: No module named 'mcunet.mcunet'

brainchip-india commented 1 year ago

@meenchen copying init.py to mcunet parent folder solved the problem able to download the model command: checkout to tinyengine folder cp mcunet/mcunet/init.py mcunet/

meenchen commented 1 year ago

Hi @brainchip-india,

Thanks for letting us know about the issue of accessing the mcunet submodule. However, I was not able to reproduce this error on my side. Are you following the steps from our tutorial? could you share some info on your dev environment (e.g., OS and python version)?

brainchip-india commented 1 year ago

I am using UBUNTU 20.0.4 and python as recommended in the tutorial 3.6+

brainchip-india commented 1 year ago

Hi @meenchen, @RaymondWang0 I tried the mcunet-vww-0 static image inference on the new board (STM32H743IIT6) and I see that the inference time result is 202ms which is almost 8 times more than the measured results of yours. Am I missing anything here? please let me know.

meenchen commented 1 year ago

Hi @meenchen, @RaymondWang0 I tried the mcunet-vww-0 static image inference on the new board (STM32H743IIT6) and I see that the inference time result is 202ms which is almost 8 times more than the measured results of yours. Am I missing anything here? please let me know.

Hi @brainchip-india, please double-check the following settings:

  1. Make sure to use -O3 or -Ofast for the optimization level of C/C++.
  2. Make sure you enable the D cache and I cache on the board.
  3. Make sure your board is running at 480MHz.
brainchip-india commented 1 year ago

thanks @meenchen I missed the 3rd point now I see the time results matches with the measured results of yours.

brainchip-india commented 1 year ago

Hi @meenchen, the inference results of mcunet-vww0 tflite are not matching with the measured results of yours. image

I remember you said the reason for that is I am using static tflite library for the inference, just want to know any other parameters I need to take care because the tflite commit is same as previous the time results should at-least come close to your measured results number.

meenchen commented 1 year ago

thanks @meenchen I missed the 3rd point now I see the time results matches with the measured results of yours.

Glad to know to problem is resolved.

meenchen commented 1 year ago

Hi @meenchen, the inference results of mcunet-vww0 tflite are not matching with the measured results of yours. image

I remember you said the reason for that is I am using static tflite library for the inference, just want to know any other parameters I need to take care because the tflite commit is same as previous the time results should at-least come close to your measured results number.

Hi @brainchip-india, my suspension for the mismatch result is that when you compile the tflite library, the compiler version and optimization flags used might be different. I assume when you follow the instruction from tensorflow lite to build the library. In our measurement, we build the tflite executable from the source with the STM32CubeIDE 1.5.0. There could be some performance gaps due to such difference.

brainchip-india commented 1 year ago

thanks @meenchen, I see only FPU mismatch remaining all I already took care can u share what is the FPU you used for the STM32H743IIT6 board, mine is FPv5-D16.

meenchen commented 1 year ago

thanks @meenchen, I see only FPU mismatch remaining all I already took care can u share what is the FPU you used for the STM32H743IIT6 board, mine is FPv5-D16.

Mine is FPv5-D16 as well.

brainchip-india commented 1 year ago

Hi @meenchen, I fixed it the reason is there are additional flags we need to enable when creating static tflite library i.e THIRD_PARTY_KERNEL_OPTIMIZATION_LEVEL=-Ofast KERNEL_OPTIMIZATION_LEVEL=-Ofast

now I see the time results of mine are close to your measured results.

meenchen commented 1 year ago

Hi @brainchip-india, glad to hear the problem is solved. I will close this issue then. Feel free to reopen it or open another issue.