mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
https://mcunet.mit.edu
MIT License
806 stars 131 forks source link

Build Guidance for non-STM32 MCUs #1

Closed pialin closed 1 year ago

pialin commented 2 years ago

Tinyengine is an exciting project to people want to deploy ai models on mcus like me . I think tinyengine would enable deep learning on huge amounts of edge devices including but not limited to STM32 chips, so please consider writing a makefile template / build guidance for people want to build tinyengine for non-stm32 mcus with arm-linux toolchains .

Thanks a lot!

meenchen commented 2 years ago

Hi, thanks for your interest in our work! We provide the template project for STM32 as an example, but you can still use Tinyengine as a normal C/C++ library for other chips. However, our current implementation has a dependency on ARM's DSP which is only available on Cortex-M4 and Cortex-M7.

pialin commented 2 years ago

@meenchen Thanks for your response! I am tring to deploy tflite models on Arduino Nano 33 BLE Sense (Cortex-M4) and found "TinyEngine" a better choice than "tflite-micro".
Can I simply build the codes with arm-gnu toolchains? Could you possibly give me some advices on where and how to start?

meenchen commented 2 years ago

Yes. You can build Tinyengine with arm-gnu toolchains. Here are the steps that may help.

  1. Convert your tflite model to C code with Tinyengine/code_generator. You can refer to the python scripts in the examples of this repo.
  2. Setup ARM CMSIS-NN by adding it to your source and add CMSIS/NN/Include to include path.
  3. Setup Tinyenige by adding it to your source and add Tinyengine/include and codegen/include (generated from the code generator) to include path.
  4. Refer to the main.cpp in our tutorial for API usage (e.g., getInput(), invoke(), and getOutput())
ssr920121 commented 2 years ago

hi @meenchen , I'm working on Tinyengine deployment on non-ARM chip too, and I'm considering using pure interger models to avoid any floating point calculation. But i'm not sure if it will still have the dependency on ARM's DSP or not.

As you mentioned <<our current implementation has a dependency on ARM's DSP which is only available on Cortex-M4 and Cortex-M7>>, could u pls specify what kind of model operation in Tinyengine will use ARM's DSP instructions?

Zepan commented 2 years ago

hi @meenchen , I'm working on Tinyengine deployment on non-ARM chip too, and I'm considering using pure interger models to avoid any floating point calculation. But i'm not sure if it will still have the dependency on ARM's DSP or not.

As you mentioned <<our current implementation has a dependency on ARM's DSP which is only available on Cortex-M4 and Cortex-M7>>, could u pls specify what kind of model operation in Tinyengine will use ARM's DSP instructions?

Hi, for non-ARM chips, I suggest use TinyMaix infer lib, it suport ARM SIMD/NEON/MVEI, RISC-V P/V extend accelerate, support INT8/FP8/FP16/FP32 models, and only 400-lines core code, easy to port to any platform (even 2KiB ram Arduino). check it out: https://github.com/sipeed/TinyMaix

meenchen commented 2 years ago

hi @meenchen , I'm working on Tinyengine deployment on non-ARM chip too, and I'm considering using pure interger models to avoid any floating point calculation. But i'm not sure if it will still have the dependency on ARM's DSP or not.

As you mentioned <<our current implementation has a dependency on ARM's DSP which is only available on Cortex-M4 and Cortex-M7>>, could u pls specify what kind of model operation in Tinyengine will use ARM's DSP instructions?

Hi @ssr920121, thanks for your interest in our work! Currently, we use the ARM's DSP only for 1x1 conv and 3x3 conv.

nixward commented 2 years ago

Hi @meenchen, I really appreciate your work on these projects. I was just wondering how hard would it to be to port TinyEngine to work with a M33 core? I'm thinking of trying a project with Nordic's nRF5340 SoC.

meenchen commented 1 year ago

Hi @meenchen, I really appreciate your work on these projects. I was just wondering how hard would it to be to port TinyEngine to work with a M33 core? I'm thinking of trying a project with Nordic's nRF5340 SoC.

Hi @nixward, I am not familiar with M33, but from the spec it has DSP, so porting TinyEngine should be seamless.

rodonguyen commented 1 year ago

Hi @nixward, I am not familiar with M33, but from the spec it has DSP, so porting TinyEngine should be seamless.

Hi @meenchen, I'm new to this area so I want to know how you can check if a chip support DSP. What part of the chip specifications will indicate that? You could take Cortex-M7 as an example.

p/s: Amazing work, thank you!

meenchen commented 1 year ago

Hi @rodonguyen,

You can refer to this document: https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__SIMD__gr.html to see which chip supports the intrinsic instructions used by tinyengine.

T122D002L commented 1 year ago

Hi, thanks for your interest in our work! We provide the template project for STM32 as an example, but you can still use Tinyengine as a normal C/C++ library for other chips. However, our current implementation has a dependency on ARM's DSP which is only available on Cortex-M4 and Cortex-M7.

Hi @meenchen , It's really an impressive work, i'm new to this area, and I want to know that can I deploy the TTE to the Mobile devices since i saw the the intro in the slides On-Device Training and Transfer Learning (Part I) that you have extend the TTE for smartphones, how can i implement that. Can you give some instructions?

meenchen commented 1 year ago

Hi @T122D002L,

Thanks for your interest in our work. We haven't released TTE for smartphones yet, but please stay tuned for updates!

meenchen commented 1 year ago

Close due to inactivity. Feel free to reopen.

ArpeggioP commented 1 year ago

Hi, I am working on a RISC-V processor and I would like to implement MCUNet V2 with TinyEngine. I am using the M1s Sipeed board and TinyMaix for DSP and other functions, as well as the toolchain.

However, I am facing some challenges with implementing MCUNet V2, using MobileNet V2 as the backbone. I am having trouble finding the appropriate layer functions for the model.

I have discovered many functions in the TinyEngine folder, such as "patchpadding_convolve_s8_kernel3_inputch3_stride2", "convolve_s8_kernel2x3_inputch3_stride2_pad1", and "convolve_1x1_s8_ch8". However, I am not sure which function is suitable for the model.

Furthermore, it would be great if more documentation could be added for the TinyEngine functions. TinyEngine is a great work, but more explanations in the documentation would make it easier to use.

Thanks