sipeed / TinyMaix

TinyMaix is a tiny inference library for microcontrollers (TinyML).
Apache License 2.0
868 stars 142 forks source link

TinyMaix

中文 | English

TinyMaix is a tiny inference Neural Network library specifically for microcontrollers (TinyML).
We design it follow the rule: Easy-to-Use > Portable > Speed > Space

Introduction to tinyML: TinyML
See tested 48 chips and benchmark: benchmark
Good News: Rewarded Porting TinyMaix

Highlights

Run mnist demo on Arduino ATmega328

mnist demo
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
000000000077AFF9500000000000
000000000AFFFFFFD10000000000
00000000AFFFD8BFF70000000000
00000003FFD2000CF80000000000
00000004FD10007FF40000000000
00000000110000DFF40000000000
00000000000007FFC00000000000
0000000000004FFE300000000000
0000000000008FF9000000000000
00000000000BFF90000000000000
00000000001EFE20000000000000
0000000000CFF800000000000000
0000000004FFB000000000000000
000000001CFF8000000000000000
000000008FFA0000000000000000
00000000FFF10000000000000000
00000000FFF21111000112999900
00000000FFFFFFFFA8AFFFFFFF70
00000000AFFFFFFFFFFFFFFA7730
0000000007777AFFF97720000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
===use 49912us
0: 0
1: 0
2: 89
3: 0
4: 1
5: 6
6: 1
7: 0
8: 0
9: 0
### Predict output is: Number 2, prob=89

TODO List

  1. optimize tm_layers.c to tm_layers_O1.c, aimed to speed up to 1.4~2.0X Done
  2. ~~Add "ADD" OPS to support resnet/mbnet v2 ~~ Done!
  3. Train good backbone for 64KB/128KB/256KB/512KB ram litmit
  4. Add example: Detector,KWS,HAR,Gesture,OCR,...
  5. ...

Do you want take participate in development of TinyMaix, or discuss with TinyML hobbyist?
Join our telegram group: https://t.me/tinymaix

TinyMaix Design

TinyMaix is design for running AI Neural Network Mdoels on resources limited MCUs, which usually called TinyML

There are many TinyML infer library now, like TFLite micro, microTVM, NNoM, so why we need TinyMaix?

TinyMaix is a weekend hackathons project, so it is simple enough to read though in 30 minutes, and it will help TinyML newbies to understand how is it running.

TinyMaix aims to be a simple TinyML infererence library, it abandon many new features and doesn't use libs like CMSIS-NN.

Following this design goal, now TinyMaix is as simple as 5 files to compile~

We hope TinyMaix can help any MCU run AI Neural Network Mdoels, every one can port it to theirself hardware platform~

Note: Although TinyMaix support multi architecture accelerate, but it still need more effort to balance size and speed.

Features in design

Features maybe added

Features won't be added

Try Demos

mnist

MNIST is handwritten digit recognition task, it is simple enough for even 8bit MCU like ATmega328.
Try it on PC:

cd examples/mnist
mkdir build
cd build 
cmake ..
make
./mnist

mbnet

mbnet (mobilenet v1) is simple classification model for mobile devices, but it is still a little heavy for MCUs.
The model in demo is mobilenet v1 0.25, it input 128x128x3 RGB image, output 1000 classes predict.
It need at least 128KB SRAM and 512KB Flash, STM32F411 is the typical minimum config for this model.

Try run mobilenet

cd examples/mbnet
mkdir build
cd build 
cmake ..
make
./mbnet

How to use (API)

Load Model

tm_err_t tm_load (tm_mdl_t mdl, const uint8_t bin, uint8_tbuf, tm_cb_t cb, tm_mat_t in);

mdl: model handle;
bin: model bin buf;
buf: main buf for middle output; if NULL, auto malloc main buf; else, use your static buffer.
cb: layer callback;
in: return input mat, include buf addr; //you can ignore it if use static buf

Remove Model

void tm_unload(tm_mdl_t* mdl);

Preprocess Input Data

tm_err_t tm_preprocess(tm_mdl_t mdl, tm_pp_t pp_type, tm_mat_t in, tm_mat_t* out);
TMPP_FP2INT //user own fp buf -> int input buf
TMPP_UINT2INT //int8: cvt in place; int16: can't cvt in place
TMPP_UINT2FP01 // u8/255.0
TMPP_UINT2FPN11// (u8-128)/128

Run Model

tm_err_t tm_run (tm_mdl_t mdl, tm_mat_t in, tm_mat_t* out);

How to port

The core file is those 5 files: tm_model.c, tm_layers.c, tinymaix.h, tm_port.h, arch_xxx.h

If you are using normal mcu without any acceleration instructions, choose arch_cpu.h, otherwise choose corresponding architecture header.

And you should edit tm_port.h to fill your desired configs, all config macro have annotation follow it.

Note TM_MAX_CSIZE,TM_MAX_KSIZE,TM_MAX_KCSIZE will occupy static buffers.

And now just put them into your project, compile it~

How to train/convert models

There are training scripts in examples/mnist to learn how to train simple mnist models.

Note: you need install TensorFlow (>=2.7) first.

After training and save h5 models, you can use scripts in tools to convert to tmdl or c header files.

  1. h5_to_tflite.py
    convert h5 model to float or int8 quant tflite files
    python3 h5_to_tflite.py h5/mnist.h5 tflite/mnist_f.tflite 0
    python3 h5_to_tflite.py h5/mnist.h5 tflite/mnist_q.tflite 1 quant_img_mnist/ 0to1
  2. tflite2tmdl.py
    convert tflite file to tmdl or c header files.
    python3 tflite2tmdl.py tflite/mnist_q.tflite tmdl/mnist_q.tmdl int8 1 28,28,1 10
    ================ pack model head ================
    mdl_type   =0
    out_deq    =1
    input_cnt  =1
    output_cnt =1
    layer_cnt  =6
    buf_size   =1464
    sub_size   =0
    in_dims    = [3, 28, 28, 1]
    out_dims   = [1, 1, 1, 10]
    ================   pack layers   ================
    CONV_2D
    [3, 28, 28, 1] [3, 13, 13, 4]
    in_oft:0, size:784;  out_oft:784, size:680
    padding valid
    layer_size=152
    CONV_2D
    [3, 13, 13, 4] [3, 6, 6, 8]
    in_oft:784, size:680;  out_oft:0, size:288
    padding valid
    layer_size=432
    CONV_2D
    [3, 6, 6, 8] [3, 2, 2, 16]
    in_oft:0, size:288;  out_oft:1400, size:64
    padding valid
    layer_size=1360
    MEAN
    [3, 2, 2, 16] [1, 1, 1, 16]
    in_oft:1400, size:64;  out_oft:0, size:16
    layer_size=48
    FULLY_CONNECTED
    [1, 1, 1, 16] [1, 1, 1, 10]
    in_oft:0, size:16;  out_oft:1448, size:16
    layer_size=304
    SOFTMAX
    [1, 1, 1, 10] [1, 1, 1, 10]
    OUTPUT!
    in_oft:1448, size:16;  out_oft:0, size:56
    layer_size=48
    ================    pack done!   ================
    model  size 2.4KB (2408 B) FLASH
    buffer size 1.4KB (1464 B) RAM
    single layer mode subbuff size 1.4KB (64+1360=1424 B) RAM
    Saved to tmdl/mnist_q.tmdl, tmdl/mnist_q.h

Now you have tmdl or c header files, put it into your project to use it~

How to train models online with MaixHub

You can download models from MaixHub or train your AI models online easily with MaixHub, don't need AI knowledge, train your model just click your mouse.

How to add new platform acceleration code

For new platforms, you just need add arch_xxx.h to src dir, and implement functions inside.
Here is the main functions you need implement (sort by importance):

a. TM_INLINE void tm_dot_prod(mtype_t* sptr, mtype_t* kptr,uint32_t size, sumtype_t* result)
    implement platform's dot product functions, usually use MAC related instructions. 

b. TM_INLINE void tm_dot_prod_pack2(mtype_t* sptr, mtype_t* kptr, uint32_t size, sumtype_t* result)
    implement platform's dual channel dot product functions  
  (not 4 or more channel, because some chip platform's register is not enough to support more channels)

c. TM_INLINE void tm_postprocess_sum(int n, sumtype_t* sums, btype_t* bs, int act, mtype_t* outp, sctype_t* scales, sctype_t out_s, zptype_t out_zp)
    implement platform's batch postprocess functions, note n is power of 2.

d. TM_INLINE void tm_dot_prod_3x3x1(mtype_t* sptr, mtype_t* kptr, sumtype_t* result)
    implement platform 3x3 dot product. mostly use handwrite cpu code.

e. TM_INLINE void tm_dot_prod_gap_3x3x1(mtype_t* sptr, mtype_t* kptr, uint32_t* k_oft, sumtype_t* result)
    implement platform 3x3 gap dot product. 

...

Contribution & Contacts

If you want contribute functions to TinyMaix, please read "TinyMaix Design" sections, we only want functions in "Features in design" and "Features maybe added".

If you want commit your port test result, please commit to benchmark.md. You are welcome to port TinyMaix to your chip/boards, it will prove how easy to use TinyMaix run Deeplearning model in MCUs~

If you have question with TinyMaix usage/porting, please feedback Issues in this repo.

If you have bussiness project consulting or private questions, you can send mail to support@sipeed.com or zepan@sipeed.com (Caesar Wu).