mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
https://mcunet.mit.edu
MIT License
805 stars 131 forks source link
c codegenerator cpp deep-learning edge-computing microcontroller neural-architecture-search pytorch quantization tinyml

TinyEngine

This is the official implementation of TinyEngine, a memory-efficient and high-performance neural network library for Microcontrollers. TinyEngine is a part of MCUNet, which also consists of TinyNAS. MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. TinyEngine and TinyNAS are co-designed to fit the tight memory budgets.

The MCUNet and TinyNAS repo is here.

TinyML Project Website | MCUNetV1 | MCUNetV2 | MCUNetV3

Demo (Inference)

demo

Demo (Training)

demo_v3

News

If you are interested in getting updates, please sign up here to get notified!

Overview

Microcontrollers are low-cost, low-power hardware. They are widely deployed and have wide applications, but the tight memory budget (50,000x smaller than GPUs) makes deep learning deployment difficult.

MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. It consists of TinyNAS and TinyEngine. They are co-designed to fit the tight memory budgets. With system-algorithm co-design, we can significantly improve the deep learning performance on the same tiny memory budget.

overview

Specifically, TinyEngine is a memory-efficient inference library. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing memory usage and accelerating the inference. It outperforms existing inference libraries such as TF-Lite Micro from Google, CMSIS-NN from Arm, and X-CUBE-AI from STMicroelectronics.

TinyEngine adopts the following optimization techniques to accelerate inference speed and minimize memory footprint.

inplace_depthwise

By adopting the abovementioned optimization techniques, TinyEngine can not only enhance inference speed but also reduce peak memory, as shown in the figures below.

MAC/s improvement breakdown: mac_result

Peak memory reduction: peakmem_result

To sum up, our TinyEngine inference engine could be a useful infrastructure for MCU-based AI applications. It significantly improves the inference speed and reduces the memory usage compared to existing libraries like TF-Lite Micro, CMSIS-NN, X-CUBE-AI, etc. It improves the inference speed by 1.1-18.6x, and reduces the peak memory by 1.3-3.6x.

measured_result

Save Memory with Patch-based Inference: We can dramastically reduce the inference peak memory by using patch-based inference for the memory-intensive stage of CNNs. measured_result

For MobileNetV2, using patch-based inference allows us to reduce the peak memory by 8x. measured_result

With patch-based infernece, tinyengine achieves higher accuracy at the same memory budgets. measured_result

Code Structure

code_generator contains a python library that is used to compile neural networks into low-level source code (C/C++).

TinyEngine contains a C/C++ library that implements operators and performs inference on Microcontrollers.

examples contains the examples of transforming TFLite models into our TinyEngine models.

tutorial contains the demo tutorial (of inference and training) of deploying a visual wake words (VWW) model onto microcontrollers.

assets contains misc assets.

Requirement

Setup for Users

First, clone this repository:

git clone --recursive https://github.com/mit-han-lab/tinyengine.git

(Optional) Using a virtual environment with conda is recommended.

conda create -n tinyengine python=3.6 pip
conda activate tinyengine

Install dependencies:

pip install -r requirements.txt

Setup for Developers

Install pre-commit hooks to automatically format changes in your code.

pre-commit install

Deployment Example

Please see tutorial to learn how to deploy a visual wake words (VWW) model onto microcontrollers by using TinyEngine. We include both the inference demo and the training demo in the tutorial, please take a look!

Measured Results

The latency results:

net_id TF-Lite Micro
@ 713b6ed
CMSIS-NN
@ 011bf32
X-CUBE-AI
v7.3.0
TinyEngine
@ 0363956
# mcunet models (VWW)
mcunet-vww0 587ms 53ms 32ms 27ms
mcunet-vww1 1120ms 97ms 57ms 51ms
mcunet-vww2 5310ms 478ms 269ms 234ms
# mcunet models (ImageNet)
mcunet-in0 586ms 51ms 35ms 25ms
mcunet-in1 1227ms 103ms 63ms 56ms
mcunet-in2 6463ms 642ms 351ms 280ms
mcunet-in3 7821ms 770ms 414ms 336ms
mcunet-in4 OOM OOM 516ms 463ms
# baseline models
proxyless-w0.3-r64 512ms 54kB 35kB 23kB
proxyless-w0.3-r176 3801ms 380ms 205ms 176ms
mbv2-w0.3-r64 467ms 43ms 29ms 23ms

The peak memory (SRAM) results:

net_id TF-Lite Micro
@ 713b6ed
CMSIS-NN
@ 011bf32
X-CUBE-AI
v7.3.0
TinyEngine
@ 0363956
# mcunet models (VWW)
mcunet-vww0 163kB 163kB 88kB 59kB
mcunet-vww1 220kB 220kB 113kB 92kB
mcunet-vww2 385kB 390kB 201kB 174kB
# mcunet models (ImageNet)
mcunet-in0 161kB 161kB 69kB 49kB
mcunet-in1 219kB 219kB 106kB 96kB
mcunet-in2 460kB 469kB 238kB 215kB
mcunet-in3 493kB 493kB 243kB 260kB
mcunet-in4 OOM OOM 342kB 416kB
# baseline models
proxyless-w0.3-r64 128kB 136kB 97kB 35kB
proxyless-w0.3-r176 453kB 453kB 221kB 259kB
mbv2-w0.3-r64 173kB 173kB 88kB 61kB

The Flash memory usage results:

net_id TF-Lite Micro
@ 713b6ed
CMSIS-NN
@ 011bf32
X-CUBE-AI
v7.3.0
TinyEngine
@ 0363956
# mcunet models (VWW)
mcunet-vww0 627kB 646kB 463kB 453kB
mcunet-vww1 718kB 736kB 534kB 521kB
mcunet-vww2 1016kB 1034kB 774kB 741kB
# mcunet models (ImageNet)
mcunet-in0 1072kB 1090kB 856kB 842kB
mcunet-in1 937kB 956kB 737kB 727kB
mcunet-in2 1084kB 1102kB 849kB 830kB
mcunet-in3 1091kB 1106kB 867kB 835kB
mcunet-in4 OOM OOM 1843kB 1825kB
# baseline models
proxyless-w0.3-r64 1065kB 1084kB 865kB 777kB
proxyless-w0.3-r176 1065kB 1084kB 865kB 779kB
mbv2-w0.3-r64 940kB 959kB 768kB 690kB

Citation

If you find the project helpful, please consider citing our paper:

@article{
  lin2020mcunet,
  title={Mcunet: Tiny deep learning on iot devices},
  author={Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

@inproceedings{
  lin2021mcunetv2,
  title={MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning},
  author={Lin, Ji and Chen, Wei-Ming and Cai, Han and Gan, Chuang and Han, Song},
  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

@article{
  lin2022ondevice,
  title = {On-Device Training Under 256KB Memory},
  author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song},
  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2022}
}

Related Projects

MCUNet: Tiny Deep Learning on IoT Devices (NeurIPS'20)

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning (NeurIPS'21)

MCUNetV3: On-Device Training Under 256KB Memory (NeurIPS'22)