中文 | English
TinyMaix is a tiny inference Neural Network library specifically for microcontrollers (TinyML).
We design it follow the rule: Easy-to-Use > Portable > Speed > Space
Introduction to tinyML: TinyML
See tested 48 chips and benchmark: benchmark
Good News: Rewarded Porting TinyMaix
Highlights
Run mnist demo on Arduino ATmega328
mnist demo
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
000000000077AFF9500000000000
000000000AFFFFFFD10000000000
00000000AFFFD8BFF70000000000
00000003FFD2000CF80000000000
00000004FD10007FF40000000000
00000000110000DFF40000000000
00000000000007FFC00000000000
0000000000004FFE300000000000
0000000000008FF9000000000000
00000000000BFF90000000000000
00000000001EFE20000000000000
0000000000CFF800000000000000
0000000004FFB000000000000000
000000001CFF8000000000000000
000000008FFA0000000000000000
00000000FFF10000000000000000
00000000FFF21111000112999900
00000000FFFFFFFFA8AFFFFFFF70
00000000AFFFFFFFFFFFFFFA7730
0000000007777AFFF97720000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
===use 49912us
0: 0
1: 0
2: 89
3: 0
4: 1
5: 6
6: 1
7: 0
8: 0
9: 0
### Predict output is: Number 2, prob=89
Do you want take participate in development of TinyMaix, or discuss with TinyML hobbyist?
Join our telegram group: https://t.me/tinymaix
TinyMaix is design for running AI Neural Network Mdoels on resources limited MCUs, which usually called TinyML
There are many TinyML infer library now, like TFLite micro, microTVM, NNoM, so why we need TinyMaix?
TinyMaix is a weekend hackathons project, so it is simple enough to read though in 30 minutes, and it will help TinyML newbies to understand how is it running.
TinyMaix aims to be a simple TinyML infererence library, it abandon many new features and doesn't use libs like CMSIS-NN.
Following this design goal, now TinyMaix is as simple as 5 files to compile~
We hope TinyMaix can help any MCU run AI Neural Network Mdoels, every one can port it to theirself hardware platform~
Note: Although TinyMaix support multi architecture accelerate, but it still need more effort to balance size and speed.
MNIST is handwritten digit recognition task, it is simple enough for even 8bit MCU like ATmega328.
Try it on PC:
cd examples/mnist
mkdir build
cd build
cmake ..
make
./mnist
mbnet (mobilenet v1) is simple classification model for mobile devices, but it is still a little heavy for MCUs.
The model in demo is mobilenet v1 0.25, it input 128x128x3 RGB image, output 1000 classes predict.
It need at least 128KB SRAM and 512KB Flash, STM32F411 is the typical minimum config for this model.
Try run mobilenet
cd examples/mbnet
mkdir build
cd build
cmake ..
make
./mbnet
tm_err_t tm_load (tm_mdl_t mdl, const uint8_t bin, uint8_tbuf, tm_cb_t cb, tm_mat_t in);
mdl: model handle;
bin: model bin buf;
buf: main buf for middle output; if NULL, auto malloc main buf; else, use your static buffer.
cb: layer callback;
in: return input mat, include buf addr; //you can ignore it if use static buf
void tm_unload(tm_mdl_t* mdl);
tm_err_t tm_preprocess(tm_mdl_t mdl, tm_pp_t pp_type, tm_mat_t in, tm_mat_t* out);
TMPP_FP2INT //user own fp buf -> int input buf
TMPP_UINT2INT //int8: cvt in place; int16: can't cvt in place
TMPP_UINT2FP01 // u8/255.0
TMPP_UINT2FPN11// (u8-128)/128
tm_err_t tm_run (tm_mdl_t mdl, tm_mat_t in, tm_mat_t* out);
The core file is those 5 files: tm_model.c, tm_layers.c, tinymaix.h, tm_port.h, arch_xxx.h
If you are using normal mcu without any acceleration instructions, choose arch_cpu.h, otherwise choose corresponding architecture header.
And you should edit tm_port.h to fill your desired configs, all config macro have annotation follow it.
Note TM_MAX_CSIZE,TM_MAX_KSIZE,TM_MAX_KCSIZE will occupy static buffers.
And now just put them into your project, compile it~
There are training scripts in examples/mnist to learn how to train simple mnist models.
Note: you need install TensorFlow (>=2.7) first.
After training and save h5 models, you can use scripts in tools to convert to tmdl or c header files.
================ pack model head ================
mdl_type =0
out_deq =1
input_cnt =1
output_cnt =1
layer_cnt =6
buf_size =1464
sub_size =0
in_dims = [3, 28, 28, 1]
out_dims = [1, 1, 1, 10]
================ pack layers ================
CONV_2D
[3, 28, 28, 1] [3, 13, 13, 4]
in_oft:0, size:784; out_oft:784, size:680
padding valid
layer_size=152
CONV_2D
[3, 13, 13, 4] [3, 6, 6, 8]
in_oft:784, size:680; out_oft:0, size:288
padding valid
layer_size=432
CONV_2D
[3, 6, 6, 8] [3, 2, 2, 16]
in_oft:0, size:288; out_oft:1400, size:64
padding valid
layer_size=1360
MEAN
[3, 2, 2, 16] [1, 1, 1, 16]
in_oft:1400, size:64; out_oft:0, size:16
layer_size=48
FULLY_CONNECTED
[1, 1, 1, 16] [1, 1, 1, 10]
in_oft:0, size:16; out_oft:1448, size:16
layer_size=304
SOFTMAX
[1, 1, 1, 10] [1, 1, 1, 10]
OUTPUT!
in_oft:1448, size:16; out_oft:0, size:56
layer_size=48
================ pack done! ================
model size 2.4KB (2408 B) FLASH
buffer size 1.4KB (1464 B) RAM
single layer mode subbuff size 1.4KB (64+1360=1424 B) RAM
Saved to tmdl/mnist_q.tmdl, tmdl/mnist_q.h
Now you have tmdl or c header files, put it into your project to use it~
You can download models from MaixHub or train your AI models online easily with MaixHub, don't need AI knowledge, train your model just click your mouse.
.tmdl
file and .h
file, use one of them in your code.report.json
, report info, json format, we can find labels or anchors in this file, we will use these params in our code. Attention, these params will change in every training, you should copy these params to your code when change model, or you will the result will be wrong.
- There's two type: classification and detection, for first time usage, use classification is recommended.
- There's many backbone, you should select proper backbone according to your MCU's RAM size, the smaller RAM size, should choose the smaller backbone.
- For easier understanding how MaixHub works, at first time you can choose tfjs platform instead of tinymaix to run model on your mobile phone.
maixhub_image_classification
demo or maixhub_image_detection
demo to run your model.For new platforms, you just need add arch_xxx.h to src dir, and implement functions inside.
Here is the main functions you need implement (sort by importance):
a. TM_INLINE void tm_dot_prod(mtype_t* sptr, mtype_t* kptr,uint32_t size, sumtype_t* result)
implement platform's dot product functions, usually use MAC related instructions.
b. TM_INLINE void tm_dot_prod_pack2(mtype_t* sptr, mtype_t* kptr, uint32_t size, sumtype_t* result)
implement platform's dual channel dot product functions
(not 4 or more channel, because some chip platform's register is not enough to support more channels)
c. TM_INLINE void tm_postprocess_sum(int n, sumtype_t* sums, btype_t* bs, int act, mtype_t* outp, sctype_t* scales, sctype_t out_s, zptype_t out_zp)
implement platform's batch postprocess functions, note n is power of 2.
d. TM_INLINE void tm_dot_prod_3x3x1(mtype_t* sptr, mtype_t* kptr, sumtype_t* result)
implement platform 3x3 dot product. mostly use handwrite cpu code.
e. TM_INLINE void tm_dot_prod_gap_3x3x1(mtype_t* sptr, mtype_t* kptr, uint32_t* k_oft, sumtype_t* result)
implement platform 3x3 gap dot product.
...
If you want contribute functions to TinyMaix, please read "TinyMaix Design" sections, we only want functions in "Features in design" and "Features maybe added".
If you want commit your port test result, please commit to benchmark.md. You are welcome to port TinyMaix to your chip/boards, it will prove how easy to use TinyMaix run Deeplearning model in MCUs~
If you have question with TinyMaix usage/porting, please feedback Issues in this repo.
If you have bussiness project consulting or private questions, you can send mail to support@sipeed.com or zepan@sipeed.com (Caesar Wu).