ppogg / YOLOv5-Lite

🍅🍅🍅YOLOv5-Lite: Evolved from yolov5 and the size of model is only 900+kb (int8) and 1.7M (fp16). Reach 15 FPS on the Raspberry Pi 4B~
GNU General Public License v3.0
2.24k stars 405 forks source link
android-app mnn mobilenet ncnn onnxruntime openvivo picodet pplcnet pytorch repvgg shufflenetv2 tensorrt tflite transformer yolov5

YOLOv5-Lite:Lighter, faster and easier to deploy

论文插图

Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320×320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).

image

Comparison of ablation experiment results

ID Model Input_size Flops Params Size(M) Map@0.5 Map@.5:0.95
001 yolo-fastest 320×320 0.25G 0.35M 1.4 24.4 -
002 YOLOv5-Liteeours 320×320 0.73G 0.78M 1.7 35.1 -
003 NanoDet-m 320×320 0.72G 0.95M 1.8 - 20.6
004 yolo-fastest-xl 320×320 0.72G 0.92M 3.5 34.3 -
005 YOLOXNano 416×416 1.08G 0.91M 7.3(fp32) - 25.8
006 yolov3-tiny 416×416 6.96G 6.06M 23.0 33.1 16.6
007 yolov4-tiny 416×416 5.62G 8.86M 33.7 40.2 21.7
008 YOLOv5-Litesours 416×416 1.66G 1.64M 3.4 42.0 25.2
009 YOLOv5-Litecours 512×512 5.92G 4.57M 9.2 50.9 32.5
010 NanoDet-EfficientLite2 512×512 7.12G 4.71M 18.3 - 32.6
011 YOLOv5s(6.0) 640×640 16.5G 7.23M 14.0 56.0 37.2
012 YOLOv5-Litegours 640×640 15.6G 5.39M 10.9 57.6 39.1

See the wiki: https://github.com/ppogg/YOLOv5-Lite/wiki/Test-the-map-of-models-about-coco

Comparison on different platforms

Equipment Computing backend System Input Framework v5lite-e v5lite-s v5lite-c v5lite-g YOLOv5s
Inter @i5-10210U window(x86) 640×640 openvino - - 46ms - 131ms
Nvidia @RTX 2080Ti Linux(x86) 640×640 torch - - - 15ms 14ms
Redmi K30 @Snapdragon 730G Android(armv8) 320×320 ncnn 27ms 38ms - - 163ms
Xiaomi 10 @Snapdragon 865 Android(armv8) 320×320 ncnn 10ms 14ms - - 163ms
Raspberrypi 4B @ARM Cortex-A72 Linux(arm64) 320×320 ncnn - 84ms - - 371ms
Raspberrypi 4B @ARM Cortex-A72 Linux(arm64) 320×320 mnn - 71ms - - 356ms
AXera-Pi Cortex A7@CPU
3.6TOPs @NPU
Linux(arm64) 640×640 axpi - - - 22ms 22ms

The tutorial of 15FPS on Raspberry Pi 4B:

https://zhuanlan.zhihu.com/p/672633849

qq交流群:993965802

入群答案:剪枝 or 蒸馏 or 量化 or 低秩分解(任意其一均可)

·Model Zoo·

@v5lite-e:

Model Size Backbone Head Framework Design for
v5Lite-e.pt 1.7m shufflenetv2(Megvii) v5Litee-head Pytorch Arm-cpu
v5Lite-e.bin
v5Lite-e.param
1.7m shufflenetv2 v5Litee-head ncnn Arm-cpu
v5Lite-e-int8.bin
v5Lite-e-int8.param
0.9m shufflenetv2 v5Litee-head ncnn Arm-cpu
v5Lite-e-fp32.mnn 3.0m shufflenetv2 v5Litee-head mnn Arm-cpu
v5Lite-e-fp32.tnnmodel
v5Lite-e-fp32.tnnproto
2.9m shufflenetv2 v5Litee-head tnn arm-cpu
v5Lite-e-320.onnx 3.1m shufflenetv2 v5Litee-head onnxruntime x86-cpu

@v5lite-s:

Model Size Backbone Head Framework Design for
v5Lite-s.pt 3.4m shufflenetv2(Megvii) v5Lites-head Pytorch Arm-cpu
v5Lite-s.bin
v5Lite-s.param
3.3m shufflenetv2 v5Lites-head ncnn Arm-cpu
v5Lite-s-int8.bin
v5Lite-s-int8.param
1.7m shufflenetv2 v5Lites-head ncnn Arm-cpu
v5Lite-s.mnn 3.3m shufflenetv2 v5Lites-head mnn Arm-cpu
v5Lite-s-int4.mnn 987k shufflenetv2 v5Lites-head mnn Arm-cpu
v5Lite-s-fp16.bin
v5Lite-s-fp16.xml
3.4m shufflenetv2 v5Lites-head openvivo x86-cpu
v5Lite-s-fp32.bin
v5Lite-s-fp32.xml
6.8m shufflenetv2 v5Lites-head openvivo x86-cpu
v5Lite-s-fp16.tflite 3.3m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-fp32.tflite 6.7m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-int8.tflite 1.8m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-416.onnx 6.4m shufflenetv2 v5Lites-head onnxruntime x86-cpu

@v5lite-c:

Model Size Backbone Head Framework Design for
v5Lite-c.pt 9m PPLcnet(Baidu) v5s-head Pytorch x86-cpu / x86-vpu
v5Lite-c.bin
v5Lite-c.xml
8.7m PPLcnet v5s-head openvivo x86-cpu / x86-vpu
v5Lite-c-512.onnx 18m PPLcnet v5s-head onnxruntime x86-cpu

@v5lite-g:

Model Size Backbone Head Framework Design for
v5Lite-g.pt 10.9m Repvgg(Tsinghua) v5Liteg-head Pytorch x86-gpu / arm-gpu / arm-npu
v5Lite-g-int8.engine 8.5m Repvgg-yolov5 v5Liteg-head Tensorrt x86-gpu / arm-gpu / arm-npu
v5lite-g-int8.tmfile 8.7m Repvgg-yolov5 v5Liteg-head Tengine arm-npu
v5Lite-g-640.onnx 21m Repvgg-yolov5 yolov5-head onnxruntime x86-cpu
v5Lite-g-640.joint 7.1m Repvgg-yolov5 yolov5-head axpi arm-npu

Download Link:

Baidu Drive Password: pogg

v5lite-s model: TFLite Float32, Float16, INT8, Dynamic range quantization, ONNX, TFJS, TensorRT, OpenVINO IR FP32/FP16, Myriad Inference Engin Blob, CoreML

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

Thanks for PINTO0309:https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

How to use

Install [**Python>=3.6.0**](https://www.python.org/) is required with all [requirements.txt](https://github.com/ppogg/YOLOv5-Lite/blob/master/requirements.txt) installed including [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/): ```bash $ git clone https://github.com/ppogg/YOLOv5-Lite $ cd YOLOv5-Lite $ pip install -r requirements.txt ```
Inference with detect.py `detect.py` runs inference on a variety of sources, downloading models automatically from the [latest YOLOv5-Lite release](https://github.com/ppogg/YOLOv5-Lite/releases) and saving results to `runs/detect`. ```bash $ python detect.py --source 0 # webcam file.jpg # image file.mp4 # video path/ # directory path/*.jpg # glob 'https://youtu.be/NUsoVlDFqZg' # YouTube 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream ```
Training ```bash $ python train.py --data coco.yaml --cfg v5lite-e.yaml --weights v5lite-e.pt --batch-size 128 v5lite-s.yaml v5lite-s.pt 128 v5lite-c.yaml v5lite-c.pt 96 v5lite-g.yaml v5lite-g.pt 64 ``` If you use multi-gpu. It's faster several times: ```bash $ python -m torch.distributed.launch --nproc_per_node 2 train.py ```

DataSet Training set and test set distribution (the path with xx.jpg) ```bash train: ../coco/images/train2017/ val: ../coco/images/val2017/ ``` ```bash ├── images # xx.jpg example │ ├── train2017 │ │ ├── 000001.jpg │ │ ├── 000002.jpg │ │ └── 000003.jpg │ └── val2017 │ ├── 100001.jpg │ ├── 100002.jpg │ └── 100003.jpg └── labels # xx.txt example ├── train2017 │ ├── 000001.txt │ ├── 000002.txt │ └── 000003.txt └── val2017 ├── 100001.txt ├── 100002.txt └── 100003.txt ```
Auto LabelImg [**Link** :https://github.com/ppogg/AutoLabelImg](https://github.com/ppogg/AutoLabelImg) You can use LabelImg based YOLOv5-5.0 and YOLOv5-Lite to AutoAnnotate, biubiubiu 🚀 🚀 🚀
Model Hub Here, the original components of YOLOv5 and the reproduced components of YOLOv5-Lite are organized and stored in the [model hub](https://github.com/ppogg/YOLOv5-Lite/tree/master/models/hub): ![modelhub](https://user-images.githubusercontent.com/82716366/146787562-e2c1c4c1-726e-4efc-9eae-d92f34333e8d.jpg)
Heatmap Analysis ```bash $ python main.py --type all ``` ![论文插图2](https://user-images.githubusercontent.com/82716366/167449474-3689c2bf-197a-4403-849c-b85db6bcc476.png) Updating ...
## How to deploy [**ncnn**](https://github.com/ppogg/YOLOv5-Lite/blob/master/cpp_demo/ncnn/README.md) for arm-cpu [**mnn**](https://github.com/ppogg/YOLOv5-Lite/blob/master/cpp_demo/mnn/README.md) for arm-cpu [**openvino**](https://github.com/ppogg/YOLOv5-Lite/blob/master/python_demo/openvino/README.md) x86-cpu or x86-vpu [**tensorrt(C++)**](https://github.com/ppogg/YOLOv5-Lite/blob/master/cpp_demo/tensorrt/README.md) for arm-gpu or arm-npu or x86-gpu [**tensorrt(Python)**](https://github.com/ppogg/YOLOv5-Lite/tree/master/python_demo/tensorrt) for arm-gpu or arm-npu or x86-gpu [**Android**](https://github.com/ppogg/YOLOv5-Lite/blob/master/android_demo/ncnn-android-v5lite/README.md) for arm-cpu ## Android_demo This is a Redmi phone, the processor is Snapdragon 730G, and yolov5-lite is used for detection. The performance is as follows: link: https://github.com/ppogg/YOLOv5-Lite/tree/master/android_demo/ncnn-android-v5lite Android_v5Lite-s: https://drive.google.com/file/d/1CtohY68N2B9XYuqFLiTp-Nd2kuFWgAUR/view?usp=sharing Android_v5Lite-g: https://drive.google.com/file/d/1FnvkWxxP_aZwhi000xjIuhJ_OhqOUJcj/view?usp=sharing new android app:[link] https://pan.baidu.com/s/1PRhW4fI1jq8VboPyishcIQ [keyword] pogg
## More detailed explanation #### Detailed model link: What is YOLOv5-Lite S/E model: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/400545131](https://zhuanlan.zhihu.com/p/400545131) What is YOLOv5-Lite C model: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/420737659](https://zhuanlan.zhihu.com/p/420737659) What is YOLOv5-Lite G model: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/410874403](https://zhuanlan.zhihu.com/p/410874403) How to deploy on ncnn with fp16 or int8: csdn link (Chinese): [https://blog.csdn.net/weixin_45829462/article/details/119787840](https://blog.csdn.net/weixin_45829462/article/details/119787840) How to deploy on mnn with fp16 or int8: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/672633849](https://zhuanlan.zhihu.com/p/672633849) How to deploy on onnxruntime: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/476533259](https://zhuanlan.zhihu.com/p/476533259)(old version) How to deploy on tensorrt: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/478630138](https://zhuanlan.zhihu.com/p/478630138) How to optimize on tensorrt: zhihu link (Chinese): [https://zhuanlan.zhihu.com/p/463074494](https://zhuanlan.zhihu.com/p/463074494) ## Reference https://github.com/ultralytics/yolov5 https://github.com/megvii-model/ShuffleNet-Series https://github.com/Tencent/ncnn ## Citing YOLOv5-Lite If you use YOLOv5-Lite in your research, please cite our work and give a star ⭐: ``` @misc{yolov5lite2021, title = {YOLOv5-Lite: Lighter, faster and easier to deploy}, author = {Xiangrong Chen and Ziman Gong}, doi = {10.5281/zenodo.5241425} year={2021} } ```