C++ and Python implementations of YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLOv11 inference.
Supported inference backends include Libtorch/PyTorch, ONNXRuntime, OpenCV, OpenVINO and TensorRT.
Supported task types include Classify, Detect and Segment.
Supported model types include FP32, FP16 and INT8.
Dependencies(tested on Ubuntu22.04):
You can test C++ code with:
# Windows
mkdir build ; cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
./run.bat
or
# Linux
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
./run.sh
C++ test in Docker(CPU i7-12700, GPU RTX3070): | Model | Task | Device | Precision | LibTorch | ONNXRuntime | OpenCV | OpenVINO | TensorRT |
---|---|---|---|---|---|---|---|---|---|
YOLOv5n | Classify | CPU | FP32 | 15.3ms | 12.2ms | 20.6ms | 14.1ms | × | |
YOLOv5n | Classify | GPU | FP32 | 4.9ms | 5.1ms | 5.1ms | ? | 4.1ms | |
YOLOv5n | Classify | CPU | FP16 | × | 21.7ms | 20.1ms | 14.0ms | × | |
YOLOv5n | Classify | GPU | FP16 | 4.6ms | 8.1ms | 4.9ms | ? | 3.2ms | |
YOLOv5n | Classify | CPU | INT8 | × | 18.3ms | × | ? | × | |
YOLOv5n | Classify | GPU | INT8 | × | 34.2ms | × | ? | 3.0ms | |
YOLOv5n | Detect | CPU | FP32 | 23.3ms | 20.2ms | 57.3ms | 20.0ms | × | |
YOLOv5n | Detect | GPU | FP32 | 7.2ms | 6.4ms | 8.2ms | ? | 4.4ms | |
YOLOv5n | Detect | CPU | FP16 | × | 41.8ms | 57.3ms | 19.8ms | × | |
YOLOv5n | Detect | GPU | FP16 | 6.8ms | 18.8ms | 7.9ms | ? | 3.9ms | |
YOLOv5n | Detect | CPU | INT8 | × | 26.7ms | × | 18.1ms | × | |
YOLOv5n | Detect | GPU | INT8 | × | 49.3ms | × | ? | 3.5ms | |
YOLOv5n | Segment | CPU | FP32 | × | 28.2ms | 75.8ms | 27.2ms | × | |
YOLOv5n | Segment | GPU | FP32 | 10.6ms | 10.6ms | 10.8ms | ? | 6.3ms | |
YOLOv5n | Segment | CPU | FP16 | × | 55.0ms | 75.9ms | 27.2ms | × | |
YOLOv5n | Segment | GPU | FP16 | 9.8ms | 29.0ms | 10.0ms | ? | 5.0ms | |
YOLOv5n | Segment | CPU | INT8 | × | 34.5ms | × | ? | × | |
YOLOv5n | Segment | GPU | INT8 | × | 62.1ms | × | ? | 4.2ms | |
YOLOv6n | Detect | CPU | FP32 | ? | 28.1ms | 29.7ms | 29.3ms | × | |
YOLOv6n | Detect | GPU | FP32 | ? | 6.4ms | 6.5ms | ? | 5.0ms | |
YOLOv6n | Detect | CPU | FP16 | × | 47.1ms | 27.4ms | 29.3ms | × | |
YOLOv6n | Detect | GPU | FP16 | ? | 13.1ms | 6.2ms | ? | 3.5ms | |
YOLOv6n | Detect | CPU | INT8 | × | 38.5ms | × | 23.4ms | × | |
YOLOv6n | Detect | GPU | INT8 | × | 95.7ms | × | ? | 4.1ms | |
YOLOv7t | Detect | CPU | FP32 | 50.5ms | 33.6ms | 59.9ms | 34.8ms | × | |
YOLOv7t | Detect | GPU | FP32 | 8.0ms | 7.7ms | 8.7ms | ? | 5.5ms | |
YOLOv7t | Detect | CPU | FP16 | × | 71.7ms | 63.7ms | 34.7ms | × | |
YOLOv7t | Detect | GPU | FP16 | ? | 21.3ms | 7.0ms | ? | 3.9ms | |
YOLOv7t | Detect | CPU | INT8 | × | 50.7ms | × | 27.8ms | × | |
YOLOv7t | Detect | GPU | INT8 | × | 85.6ms | × | ? | 3.7ms | |
YOLOv8n | Classify | CPU | FP32 | 3.5ms | 2.2ms | 4.0ms | 2.4ms | × | |
YOLOv8n | Classify | GPU | FP32 | 2.3ms | 1.5ms | 1.9ms | ? | 1.2ms | |
YOLOv8n | Classify | CPU | FP16 | × | 6.3ms | 4.0ms | 2.4ms | × | |
YOLOv8n | Classify | GPU | FP16 | ? | 1.7ms | 1.7ms | ? | 1.0ms | |
YOLOv8n | Classify | CPU | INT8 | × | 3.4ms | × | ? | × | |
YOLOv8n | Classify | GPU | INT8 | × | 7.8ms | × | ? | 1.0ms | |
YOLOv8n | Detect | CPU | FP32 | 33.3ms | 27.9ms | 42.2ms | 28.6ms | × | |
YOLOv8n | Detect | GPU | FP32 | 6.4ms | 6.9ms | 6.8ms | ? | 6.0ms | |
YOLOv8n | Detect | CPU | FP16 | × | 57.2ms | 41.9ms | 28.6ms | × | |
YOLOv8n | Detect | GPU | FP16 | ? | 19.4ms | 5.7ms | ? | 3.7ms | |
YOLOv8n | Detect | CPU | INT8 | × | 37.3ms | × | 24.5ms | × | |
YOLOv8n | Detect | GPU | INT8 | × | 85.5ms | × | ? | 4.7ms | |
YOLOv8n | Segment | CPU | FP32 | × | 42.9ms | 54.7ms | 37.5ms | × | |
YOLOv8n | Segment | GPU | FP32 | 9.5ms | 10.5ms | × | ? | 8.1ms | |
YOLOv8n | Segment | CPU | FP16 | × | 73.1ms | 54.9ms | 37.4ms | × | |
YOLOv8n | Segment | GPU | FP16 | ? | 27.3ms | × | ? | 5.9ms | |
YOLOv8n | Segment | CPU | INT8 | × | 51.0ms | × | ? | × | |
YOLOv8n | Segment | GPU | INT8 | × | 101.1ms | × | ? | 5.6ms | |
YOLOv9t | Detect | CPU | FP32 | 40.8ms | 34.6ms | 54.1ms | 29.0ms | × | |
YOLOv9t | Detect | GPU | FP32 | 8.1ms | 9.4m | 9.7ms | ? | 7.1ms | |
YOLOv9t | Detect | CPU | FP16 | × | 60.6ms | 55.0ms | 29.0ms | × | |
YOLOv9t | Detect | GPU | FP16 | ? | 17.9ms | 9.0ms | ? | 4.9ms | |
YOLOv9t | Detect | CPU | INT8 | × | 48.0ms | × | 27.0ms | × | |
YOLOv9t | Detect | GPU | INT8 | × | 135.2ms | × | ? | 5.6ms | |
YOLOv10n | Detect | CPU | FP32 | 30.4ms | 27.9ms | × | 26.1ms | × | |
YOLOv10n | Detect | GPU | FP32 | 6.0ms | 6.5m | × | ? | × | |
YOLOv10n | Detect | CPU | FP16 | × | 56.4ms | × | 26.0ms | × | |
YOLOv10n | Detect | GPU | FP16 | ? | 10.9ms | × | ? | × | |
YOLOv10n | Detect | CPU | INT8 | × | 40.7ms | × | 23.5ms | × | |
YOLOv10n | Detect | GPU | INT8 | × | 83.9ms | × | ? | × | |
YOLOv11n | Classify | CPU | FP32 | 4.1ms | 2.4ms | 4.4ms | 2.6ms | × | |
YOLOv11n | Classify | GPU | FP32 | 2.7ms | 1.7ms | × | ? | 1.4ms | |
YOLOv11n | Classify | CPU | FP16 | × | 6.3ms | 4.5ms | 2.6ms | × | |
YOLOv11n | Classify | GPU | FP16 | ? | 2.1ms | × | ? | 1.1ms | |
YOLOv11n | Classify | CPU | INT8 | × | ? | × | ? | × | |
YOLOv11n | Classify | GPU | INT8 | × | ? | × | ? | 1.3ms | |
YOLOv11n | Detect | CPU | FP32 | 35.0ms | 26.9ms | 44.4ms | 25.0ms | × | |
YOLOv11n | Detect | GPU | FP32 | 7.2ms | 7.2ms | × | ? | 6.0ms | |
YOLOv11n | Detect | CPU | FP16 | × | 61.3ms | 44.8ms | 25.0ms | × | |
YOLOv11n | Detect | GPU | FP16 | ? | 20.0ms | × | ? | 3.9ms | |
YOLOv11n | Detect | CPU | INT8 | × | ? | × | 22.8ms | × | |
YOLOv11n | Detect | GPU | INT8 | × | ? | × | ? | 4.7ms | |
YOLOv11n | Segment | CPU | FP32 | × | 38.8ms | 56.9ms | 34.0ms | × | |
YOLOv11n | Segment | GPU | FP32 | × | 10.9ms | × | ? | 7.5ms | |
YOLOv11n | Segment | CPU | FP16 | × | 78.3ms | 58.1ms | 33.8ms | × | |
YOLOv11n | Segment | GPU | FP16 | × | 27.9ms | × | ? | 6.2ms | |
YOLOv11n | Segment | CPU | INT8 | × | ? | × | ? | × | |
YOLOv11n | Segment | GPU | INT8 | × | ? | × | ? | 4.9ms |
You can test Python code with:
# Windows
pip install -r requirements.txt
./run.bat
or
# Linux
pip install -r requirements.txt
./run.sh
Python test in Docker(CPU i7-12700, GPU RTX3070): | Model | Task | Device | Precision | PyTorch | ONNXRuntime | OpenCV | OpenVINO | TensorRT |
---|---|---|---|---|---|---|---|---|---|
YOLOv5n | Classify | CPU | FP32 | 26.3ms | 21.4ms | 33.2ms | 21.8ms | × | |
YOLOv5n | Classify | GPU | FP32 | 15.6ms | 16.1ms | 16.6ms | ? | 17.0ms | |
YOLOv5n | Classify | CPU | FP16 | × | 30.3ms | 31.5ms | 21.7ms | × | |
YOLOv5n | Classify | GPU | FP16 | 14.5ms | 18.6ms | 17.4ms | ? | 19.8ms | |
YOLOv5n | Classify | CPU | INT8 | × | 28.9ms | × | ? | × | |
YOLOv5n | Classify | GPU | INT8 | × | 54.8ms | × | ? | 18.9ms | |
YOLOv5n | Detect | CPU | FP32 | 30.6ms | 27.0ms | 60.0ms | 24.8ms | × | |
YOLOv5n | Detect | GPU | FP32 | 10.4ms | 14.9ms | 10.7ms | ? | 14.3ms | |
YOLOv5n | Detect | CPU | FP16 | × | 40.7ms | 59.8ms | 24.8ms | × | |
YOLOv5n | Detect | GPU | FP16 | 12.3ms | 19.6ms | 10.3ms | ? | 12.8ms | |
YOLOv5n | Detect | CPU | INT8 | × | 33.7ms | × | 23.1ms | × | |
YOLOv5n | Detect | GPU | INT8 | × | 72.9ms | × | ? | 13.8ms | |
YOLOv5n | Segment | CPU | FP32 | 159.2ms | 116.1ms | 147.2ms | 47.8ms | × | |
YOLOv5n | Segment | GPU | FP32 | 34.6ms | 49.1ms | 38.0ms | ? | 70.7ms | |
YOLOv5n | Segment | CPU | FP16 | × | 138.8ms | 142.2ms | 48.2ms | × | |
YOLOv5n | Segment | GPU | FP16 | 50.9ms | 78.9ms | 52.4ms | ? | 72.6ms | |
YOLOv5n | Segment | CPU | INT8 | × | 127.6ms | × | ? | × | |
YOLOv5n | Segment | GPU | INT8 | × | 191.8ms | × | ? | 13.3ms | |
YOLOv6n | Detect | CPU | FP32 | ? | 54.0ms | 48.1ms | 52.0ms | × | |
YOLOv6n | Detect | GPU | FP32 | ? | 40.0ms | 34.2ms | ? | 43.0ms | |
YOLOv6n | Detect | CPU | FP16 | × | 66.4ms | 48.1ms | 51.8ms | × | |
YOLOv6n | Detect | GPU | FP16 | ? | 49.9ms | 36.3ms | ? | 40.5ms | |
YOLOv6n | Detect | CPU | INT8 | × | 67.1ms | × | 44.9ms | × | |
YOLOv6n | Detect | GPU | INT8 | × | 241.4ms | × | ? | 61.7ms | |
YOLOv7t | Detect | CPU | FP32 | 53.3ms | 41.1ms | 62.9ms | 39.4ms | × | |
YOLOv7t | Detect | GPU | FP32 | 10.6ms | 16.5ms | 10.4ms | ? | 14.0ms | |
YOLOv7t | Detect | CPU | FP16 | × | 72.2ms | 62.9ms | 39.4ms | × | |
YOLOv7t | Detect | GPU | FP16 | ? | 24.3ms | 9.1ms | ? | 12.7ms | |
YOLOv7t | Detect | CPU | INT8 | × | 58.2ms | × | 32.4ms | × | |
YOLOv7t | Detect | GPU | INT8 | × | 101.8ms | × | ? | 12.9ms | |
YOLOv8n | Classify | CPU | FP32 | 3.5ms | 2.2ms | 4.1ms | 2.3ms | × | |
YOLOv8n | Classify | GPU | FP32 | 2.5ms | 1.6ms | 1.8ms | ? | 3.5ms | |
YOLOv8n | Classify | CPU | FP16 | × | 6.3ms | 4.1s | 2.3ms | × | |
YOLOv8n | Classify | GPU | FP16 | ? | 1.7ms | 1.7ms | ? | 2.8ms | |
YOLOv8n | Classify | CPU | INT8 | × | 3.7ms | × | ? | × | |
YOLOv8n | Classify | GPU | INT8 | × | 8.2ms | × | ? | 3.0ms | |
YOLOv8n | Detect | CPU | FP32 | 59.2ms | 57.8ms | 60.3s | 49.4ms | × | |
YOLOv8n | Detect | GPU | FP32 | 35.5ms | 40.5ms | 29.4ms | ? | 39.1ms | |
YOLOv8n | Detect | CPU | FP16 | × | 77.1ms | 61.3ms | 49.6ms | × | |
YOLOv8n | Detect | GPU | FP16 | ? | 60.4ms | 30.8ms | ? | 38.1ms | |
YOLOv8n | Detect | CPU | INT8 | × | 64.1ms | × | 44.1ms | × | |
YOLOv8n | Detect | GPU | INT8 | × | 138.7ms | × | ? | 40.9ms | |
YOLOv8n | Segment | CPU | FP32 | 184.7ms | 157.8ms | 142.3ms | 100.0ms | × | |
YOLOv8n | Segment | GPU | FP32 | 94.3ms | 104.2ms | 88.5ms | ? | 116.6ms | |
YOLOv8n | Segment | CPU | FP16 | × | 180.4ms | 144.8s | 99.3ms | × | |
YOLOv8n | Segment | GPU | FP16 | ? | 122.2ms | 108.7ms | ? | 118.7ms | |
YOLOv8n | Segment | CPU | INT8 | × | 166.4ms | × | ? | × | |
YOLOv8n | Segment | GPU | INT8 | × | 275.3ms | × | ? | 40.9ms | |
YOLOv9t | Detect | CPU | FP32 | 61.0ms | 61.0ms | 74.9ms | 49.7ms | × | |
YOLOv9t | Detect | GPU | FP32 | 33.6ms | 41.4m | 31.2ms | ? | 40.2ms | |
YOLOv9t | Detect | CPU | FP16 | × | 81.0ms | 75.4ms | 49.6ms | × | |
YOLOv9t | Detect | GPU | FP16 | ? | 45.9ms | 33.5ms | ? | 41.5ms | |
YOLOv9t | Detect | CPU | INT8 | × | 74.4ms | × | 46.8ms | × | |
YOLOv9t | Detect | GPU | INT8 | × | 384.5ms | × | ? | 47.5ms | |
YOLOv10n | Detect | CPU | FP32 | 33.7ms | 34.7ms | × | 28.6ms | × | |
YOLOv10n | Detect | GPU | FP32 | 8.3ms | 13.0m | × | ? | × | |
YOLOv10n | Detect | CPU | FP16 | × | 57.8ms | × | 28.6ms | × | |
YOLOv10n | Detect | GPU | FP16 | ? | 14.4ms | × | ? | × | |
YOLOv10n | Detect | CPU | INT8 | × | 49.8ms | × | 26.1ms | × | |
YOLOv10n | Detect | GPU | INT8 | × | 103.0ms | × | ? | × | |
YOLOv11n | Classify | CPU | FP32 | 4.1ms | 2.3ms | 4.6ms | 2.5ms | × | |
YOLOv11n | Classify | GPU | FP32 | 2.8ms | 1.7ms | × | ? | 3.7ms | |
YOLOv11n | Classify | CPU | FP16 | × | 6.1ms | 4.5ms | 2.5ms | × | |
YOLOv11n | Classify | GPU | FP16 | ? | 1.9ms | × | ? | 3.3ms | |
YOLOv11n | Classify | CPU | INT8 | × | ? | × | ? | × | |
YOLOv11n | Classify | GPU | INT8 | × | ? | × | ? | 3.6ms | |
YOLOv11n | Detect | CPU | FP32 | 62.2ms | 52.9ms | 66.2ms | 45.2ms | × | |
YOLOv11n | Detect | GPU | FP32 | 38.7ms | 41.2ms | × | ? | 36.6ms | |
YOLOv11n | Detect | CPU | FP16 | × | 82.5ms | 63.0ms | 45.1ms | × | |
YOLOv11n | Detect | GPU | FP16 | ? | 58.2ms | × | ? | 38.2ms | |
YOLOv11n | Detect | CPU | INT8 | × | ? | × | 50.0ms | × | |
YOLOv11n | Detect | GPU | INT8 | × | ? | × | ? | 39.1ms | |
YOLOv11n | Segment | CPU | FP32 | 183.5ms | 152.7ms | 144.1ms | 91.9ms | × | |
YOLOv11n | Segment | GPU | FP32 | 98.2ms | 116.2ms | × | ? | 114.9ms | |
YOLOv11n | Segment | CPU | FP16 | × | 185.4ms | 155.2ms | 92.3ms | × | |
YOLOv11n | Segment | GPU | FP16 | ?ms | 130.4ms | × | ? | 120.2ms | |
YOLOv11n | Segment | CPU | INT8 | × | ? | × | ? | × | |
YOLOv11n | Segment | GPU | INT8 | × | ? | × | ? | 39.0ms |
You can get a docker image with:
docker pull taify/yolo_inference:latest
You Can download some model weights in: https://pan.baidu.com/s/1L8EyTa59qu_eEb3lKRnPQA?pwd=itda
For your own model, you should convert onnx model with following scirpt to transpose output dims for YOLOv8, YOLOv9, YOLOv11 detection and segmentation:
import onnx
import onnx.helper as helper
import sys
import os
def main():
if len(sys.argv) < 2:
print("Usage:\n python transpose.py yolov8n.onnx")
return 1
file = sys.argv[1]
if not os.path.exists(file):
print(f"Not exist path: {file}")
return 1
prefix, suffix = os.path.splitext(file)
dst = prefix + ".trans" + suffix
model = onnx.load(file)
node = model.graph.node[-1]
old_output = node.output[0]
node.output[0] = "pre_transpose"
for specout in model.graph.output:
if specout.name == old_output:
shape0 = specout.type.tensor_type.shape.dim[0]
shape1 = specout.type.tensor_type.shape.dim[1]
shape2 = specout.type.tensor_type.shape.dim[2]
new_out = helper.make_tensor_value_info(
specout.name,
specout.type.tensor_type.elem_type,
[0, 0, 0]
)
new_out.type.tensor_type.shape.dim[0].CopyFrom(shape0)
new_out.type.tensor_type.shape.dim[2].CopyFrom(shape1)
new_out.type.tensor_type.shape.dim[1].CopyFrom(shape2)
specout.CopyFrom(new_out)
model.graph.node.append(
helper.make_node("Transpose", ["pre_transpose"], [old_output], perm=[0, 2, 1])
)
print(f"Model save to {dst}")
onnx.save(model, dst)
return 0
if __name__ == "__main__":
sys.exit(main())