TFLite, ONNX, CoreML, TensorRT Export

glenn-jocher commented 4 years ago

📚 This guide explains how to export a trained YOLOv5 🚀 model from PyTorch to ONNX and TorchScript formats. UPDATED 8 December 2022.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

For TensorRT export example (requires GPU) see our Colab notebook appendix section.

Formats

YOLOv5 inference is officially supported in 11 formats:

💡 ProTip: Export to ONNX or OpenVINO for up to 3x CPU speedup. See CPU Benchmarks. 💡 ProTip: Export to TensorRT for up to 5x GPU speedup. See GPU Benchmarks.

Format	`export.py --include`	Model
PyTorch	-	`yolov5s.pt`
TorchScript	`torchscript`	`yolov5s.torchscript`
ONNX	`onnx`	`yolov5s.onnx`
OpenVINO	`openvino`	`yolov5s_openvino_model/`
TensorRT	`engine`	`yolov5s.engine`
CoreML	`coreml`	`yolov5s.mlmodel`
TensorFlow SavedModel	`saved_model`	`yolov5s_saved_model/`
TensorFlow GraphDef	`pb`	`yolov5s.pb`
TensorFlow Lite	`tflite`	`yolov5s.tflite`
TensorFlow Edge TPU	`edgetpu`	`yolov5s_edgetpu.tflite`
TensorFlow.js	`tfjs`	`yolov5s_web_model/`
PaddlePaddle	`paddle`	`yolov5s_paddle_model/`

Benchmarks

Benchmarks below run on a Colab Pro with the YOLOv5 tutorial notebook . To reproduce:

python benchmarks.py --weights yolov5s.pt --imgsz 640 --device 0

Colab Pro V100 GPU

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.7/166.8 GB disk)

Benchmarks complete (458.07s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623                10.19
1             TorchScript        0.4623                 6.85
2                    ONNX        0.4623                14.63
3                OpenVINO           NaN                  NaN
4                TensorRT        0.4617                 1.89
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623                21.28
7     TensorFlow GraphDef        0.4623                21.22
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Colab Pro CPU

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=cpu, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CPU
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 41.5/166.8 GB disk)

Benchmarks complete (241.20s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623               127.61
1             TorchScript        0.4623               131.23
2                    ONNX        0.4623                69.34
3                OpenVINO        0.4623                66.52
4                TensorRT           NaN                  NaN
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623               123.79
7     TensorFlow GraphDef        0.4623               121.57
8         TensorFlow Lite        0.4623               316.61
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Export a Trained YOLOv5 Model

This command exports a pretrained YOLOv5s model to TorchScript and ONNX formats. yolov5s.pt is the 'small' model, the second smallest model available. Other options are yolov5n.pt, yolov5m.pt, yolov5l.pt and yolov5x.pt, along with their P6 counterparts i.e. yolov5s6.pt or you own custom training checkpoint i.e. runs/exp/weights/best.pt. For details on all available models please see our README table.

python export.py --weights yolov5s.pt --include torchscript onnx

💡 ProTip: Add --half to export models at FP16 half precision for smaller file sizes

Output:

export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['torchscript', 'onnx']
YOLOv5 🚀 v6.2-104-ge3e5122 Python-3.7.13 torch-1.12.1+cu113 CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt to yolov5s.pt...
100% 14.1M/14.1M [00:00<00:00, 274MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)

TorchScript: starting export with torch 1.12.1+cu113...
TorchScript: export success ✅ 1.7s, saved as yolov5s.torchscript (28.1 MB)

ONNX: starting export with onnx 1.12.0...
ONNX: export success ✅ 2.3s, saved as yolov5s.onnx (28.0 MB)

Export complete (5.5s)
Results saved to /content/yolov5
Detect:          python detect.py --weights yolov5s.onnx 
Validate:        python val.py --weights yolov5s.onnx 
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.onnx')
Visualize:       https://netron.app/

The 3 exported models will be saved alongside the original PyTorch model:

Netron Viewer is recommended for visualizing exported models:

Exported Model Usage Examples

detect.py runs inference on exported models:

python detect.py --weights yolov5s.pt                 # PyTorch
                           yolov5s.torchscript        # TorchScript
                           yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                           yolov5s_openvino_model     # OpenVINO
                           yolov5s.engine             # TensorRT
                           yolov5s.mlmodel            # CoreML (macOS only)
                           yolov5s_saved_model        # TensorFlow SavedModel
                           yolov5s.pb                 # TensorFlow GraphDef
                           yolov5s.tflite             # TensorFlow Lite
                           yolov5s_edgetpu.tflite     # TensorFlow Edge TPU
                           yolov5s_paddle_model       # PaddlePaddle

val.py runs validation on exported models:

python val.py --weights yolov5s.pt                 # PyTorch
                        yolov5s.torchscript        # TorchScript
                        yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                        yolov5s_openvino_model     # OpenVINO
                        yolov5s.engine             # TensorRT
                        yolov5s.mlmodel            # CoreML (macOS Only)
                        yolov5s_saved_model        # TensorFlow SavedModel
                        yolov5s.pb                 # TensorFlow GraphDef
                        yolov5s.tflite             # TensorFlow Lite
                        yolov5s_edgetpu.tflite     # TensorFlow Edge TPU
                        yolov5s_paddle_model       # PaddlePaddle

Use PyTorch Hub with exported YOLOv5 models:

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.pt')
                                                       'yolov5s.torchscript ')       # TorchScript
                                                       'yolov5s.onnx')               # ONNX Runtime
                                                       'yolov5s_openvino_model')     # OpenVINO
                                                       'yolov5s.engine')             # TensorRT
                                                       'yolov5s.mlmodel')            # CoreML (macOS Only)
                                                       'yolov5s_saved_model')        # TensorFlow SavedModel
                                                       'yolov5s.pb')                 # TensorFlow GraphDef
                                                       'yolov5s.tflite')             # TensorFlow Lite
                                                       'yolov5s_edgetpu.tflite')     # TensorFlow Edge TPU
                                                       'yolov5s_paddle_model')       # PaddlePaddle

# Images
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list

# Inference
results = model(img)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

OpenCV DNN inference

OpenCV inference with ONNX models:

python export.py --weights yolov5s.pt --include onnx

python detect.py --weights yolov5s.onnx --dnn  # detect
python val.py --weights yolov5s.onnx --dnn  # validate

C++ Inference

YOLOv5 OpenCV DNN C++ inference on exported ONNX model examples:

YOLOv5 OpenVINO C++ inference examples:

TensorFlow.js Web Browser Inference

https://aukerul-shuvo.github.io/YOLOv5_TensorFlow-JS/

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

pfeatherstone commented 3 years ago

I posted an issue on the pytorch github page too. They are working on a fix.

pfeatherstone commented 3 years ago

Here is the issue https://github.com/pytorch/pytorch/issues/45816. Looks like a PR is imminent.

pfeatherstone commented 3 years ago

@glenn-jocher Is there a reason why the input size is fixed when doing an ONNX export if the anchors and grid offsets aren't applied? I can understand either:

fixed input size + apply anchors + apply grid offsets + permute dimensions + concat dimensions OR
dynamic input size + output conv layers immediately preceding yolo layers

In other words i would expect either:

input dim == [1,3,640,640] output dim == [1,25200,85] OR
input dim == [1,3, height, width] output dims == [1,3,height / 32, width / 32,85] , [1,3, height / 16, width / 16,85] , [1,3,height / 8, width / 8,85]

This is what I do for yolov3 models defined in pytorch and it works a dream.

Edwardmark commented 3 years ago

@glenn-jocher Hi, I try to convert the onnx to tensorRT, but it complaines No importer registered for op: ScatterND. Would you tell me where is the scatterND op in yolo and how to replace with some op that tensorRT support?

pfeatherstone commented 3 years ago

@glenn-jocher Is there a reason why the input size is fixed when doing an ONNX export if the anchors and grid offsets aren't applied? I can understand either:
* fixed input size + apply anchors + apply grid offsets + permute dimensions + concat dimensions OR

* dynamic input size + output conv layers immediately preceding yolo layers
In other words i would expect either:
* input dim == [1,3,640,640] output dim == [1,25200,85] OR

* input dim == [1,3, height, width] output dims == [1,3,height / 32, width / 32,85] , [1,3, height / 16, width / 16,85] , [1,3,height / 8, width / 8,85]
This is what I do for yolov3 models defined in pytorch and it works a dream.

I changed the export line to:

torch.onnx.export(model, img, f, verbose=False, export_params=True, opset_version=12, 
                          input_names=['img'],
                          output_names=['out1', 'out2', 'out3'],
                          dynamic_axes={'img': [0,2,3], 'out1': [0,2,3], 'out2': [0,2,3], 'out3': [0,2,3]})

pfeatherstone commented 3 years ago

@glenn-jocher What was the motivation behind this:

y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i]  # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh

?? That's very different to yolov3 and yolov4

dgsbicak commented 3 years ago

@Ezra-Yu yes that is correct. You are free to set it to False if that suits you better.

But setting this to False throws error for coreml conversion. @dlawrences can you help me out regarding this?

@dlawrences I get the same empty error as well. I changed nothing other than setting export to False.

model.model[-1].export = False

Printed the traceback of the error:

Converting Frontend ==> MIL Ops:  89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊               | 970/1084 [00:00<00:00, 1678.45 ops/s]
CoreML export failure: 
Traceback (most recent call last):
  File "models/export.py", line 86, in <module>
    model = ct.convert(ts, inputs=[ct.ImageType(name='image', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py", line 176, in convert
    mlmodel = mil_convert(
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 128, in mil_convert
    proto = mil_convert_to_proto(model, convert_from, convert_to,
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 171, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/converter.py", line 85, in __call__
    return load(*args, **kwargs)
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 85, in load
    raise e
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 75, in load
    prog = converter.convert()
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 224, in convert
    convert_nodes(self.context, self.graph)
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 56, in convert_nodes
    _add_op(context, node)
  File "/home/dogus/environments/work3.8/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 1612, in select
    assert _input.val is None
AssertionError

Ubuntu 18.04 Cuda 10.0 I installed PyTorch for GPU torch==1.6.0+cu101 torchvision==0.7.0+cu101

Installed packages in the Python3.8.0 environment that I use:

attr==0.3.1
attrs==20.2.0
certifi==2020.6.20
coremltools==4.0
cycler==0.10.0
Cython==0.29.21
future==0.18.2
joblib==0.17.0
kiwisolver==1.2.0
matplotlib==3.3.2
mpmath==1.1.0
numpy==1.19.2
onnx==1.7.0
opencv-python==4.4.0.44
packaging==20.4
Pillow==8.0.1
pkg-resources==0.0.0
protobuf==3.13.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scikit-learn==0.23.2
scipy==1.5.3
six==1.15.0
sympy==1.6.2
threadpoolctl==2.1.0
torch==1.6.0+cu101
torchvision==0.7.0+cu101
tqdm==4.51.0
typing-extensions==3.7.4.3

Wuuyoo commented 3 years ago

There are many warnings when convert to onnx:

Warning: ATen was a removed experimental ops. In the future, we may directly reject this operator. Please update your model as soon as possible.

but the ONNX export success.

However, when convert onnx to openvino. this warning become error:

[ ERROR ] Cannot infer shapes or values for node "ATen_470".
[ ERROR ] There is no registered "infer" function for node "ATen_470" with op = "ATen". Please implement this function in the extensions.
For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #37.
[ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function .
[ ERROR ] Or because the node inputs have incorrect values/shapes.
[ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ] Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "ATen_470" node.
For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #38.

Can anyone help me? Thank you very much!

FantasyJXF commented 3 years ago

set self.training = False in Detector before you export the model, then U can get the same output with original model.

FantasyJXF commented 3 years ago

You can work around it by manually editing the yolo.py file inside the exported torchscript archive

Same here, how to modify the torchscript file? since it's a binary file.

waicool20 commented 3 years ago

You can work around it by manually editing the yolo.py file inside the exported torchscript archive

Same here, how to modify the torchscript file? since it's a binary file.

torchscript file is just a zip file, open it with 7-zip or winrar

FantasyJXF commented 3 years ago

@waicool20 Just export the torchscript with map_location=torch.device('cuda') will solve the problem:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

But this would have another problem: the torchscript can only run on cuda device but not on CPU ONLY case. Maybe I should figure out why there are constant value while tracing the model. Or the torchscript could make it compatible for devices.👻

HoangTienDuc commented 3 years ago

i change code to export dynmaic batching. but i can't.

       dynamic_axes={'images': {0: 'batch_size'}, 'output': {0: 'batch_size'}, '781': {0: 'batch_size'}, '801': {0: 'batch_size'}}
       f = opt.weights.replace('.pt', '.onnx')  # filename
        # torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],
                        #   output_names=['classes', 'boxes'], dynamic_axes=dynamic_axes if y is None else ['output'])

        torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['images'],
                          output_names=['output'], dynamic_axes=dynamic_axes)

can anyone help me to get batching model?

zhiqwang commented 3 years ago

Hi @HoangTienDuc , I add a dynmaic batching inference in the notebooks.

HoangTienDuc commented 3 years ago

@zhiqwang thanks. i got it.

smartinellimarco commented 3 years ago

If you are getting an error such as this during inference (device mismatch)

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/yolo.py", line 47, in <fused code>
    _38 = (_5).forward(_37, )
    _39 = (_3).forward((_4).forward(_37, ), _30, )
    _40 = (_0).forward((_1).forward((_2).forward(_39, ), ), _38, _35, )
           ~~~~~~~~~~~ <--- HERE
    _41, _42, _43, _44, = _40
    return (_44, [_41, _42, _43])
  File "code/__torch__/models/yolo.py", line 73, in forward
    _52 = torch.sub(_51, CONSTANTS.c2, alpha=1)
    _53 = torch.to(CONSTANTS.c3, dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False, non_blocking=False, copy=False, memory_format=None)
    _54 = torch.mul(torch.add(_52, _53, alpha=1), torch.select(CONSTANTS.c4, 0, 0))
                    ~~~~~~~~~ <--- HERE
    _55 = torch.slice(y, 4, 0, 2, 1)
    _56 = torch.expand(torch.view(_54, [3, 20, 20, 2]), [1, 3, 20, 20, 2], implicit=True)

Traceback of TorchScript, original code (most recent call last):
C:\Users\waicool20\Programming\python\yolov5\models\yolo.py(34): forward
C:\Python38\lib\site-packages\torch\nn\modules\module.py(534): _slow_forward
C:\Python38\lib\site-packages\torch\nn\modules\module.py(548): __call__
C:\Users\waicool20\Programming\python\yolov5\models\yolo.py(117): forward_once
C:\Users\waicool20\Programming\python\yolov5\models\yolo.py(97): forward
C:\Python38\lib\site-packages\torch\nn\modules\module.py(534): _slow_forward
C:\Python38\lib\site-packages\torch\nn\modules\module.py(548): __call__
C:\Python38\lib\site-packages\torch\jit\__init__.py(1027): trace_module
C:\Python38\lib\site-packages\torch\jit\__init__.py(873): trace
./models/export.py(35): <module>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

You can work around it by manually editing the yolo.py file inside the exported torchscript archive

  def forward(self: __torch__.models.yolo.Detect,
    argument_1: Tensor,
    argument_2: Tensor,
    argument_3: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
    dev = argument_1.device                        <--- Add this line
    _45 = self.anchor_grid
    bs = ops.prim.NumToTensor(torch.size(argument_1, 0))

Then replace all the references of torch.device("cpu") to dev

Not sure if there's a better way to export it so it does something like this by default :/

This worked perfectly! The torchtraced model has the same issue with model = model.half() as some constants remain in FP32. Have you tried solving it?

smartinellimarco commented 3 years ago

Exporting the model in FP16 its not possible due to some constants remain in FP32

missbook520 commented 3 years ago

i just change the code like this: model = attempt_load(opt.weights, map_location=torch.device('cuda:0')) # load FP32 model img = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device='cuda:0') then run: python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1 and the error is: RuntimeError: CUDA error: out of memory

could you help me ?

missbook520 commented 3 years ago

@ waicool20 只需导出torchscript即可map_location=torch.device('cuda')解决问题：
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
但这会带来另一个问题：火炬脚本只能在cuda设备上运行，而不能仅在CPU上运行。也许我应该找出原因，constant value同时追踪模型。或者火炬脚本可以使其与设备兼容。👻 When I export the GPU model, the following error will be reported. May I ask why： (pytorch1_6) F:\Pytorch_Project\yolov5_11_27\yolov5-master>python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1 Namespace(batch_size=1, img_size=[640, 640], weights='./weights/yolov5s.pt') Traceback (most recent call last): File "models/export.py", line 37, in model = attempt_load(opt.weights, map_location=torch.device('cuda:0')) # load FP32 model File ".\models\experimental.py", line 137, in attempt_load model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval()) # load FP32 model File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 584, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 842, in _load result = unpickler.load() File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 834, in persistent_load load_tensor(data_type, size, key, _maybe_decode_ascii(location)) File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 823, in load_tensor loaded_storages[key] = restore_location(storage, location) File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 803, in restore_location return default_restore_location(storage, str(map_location)) File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 174, in default_restore_location result = fn(storage, location) File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch\serialization.py", line 156, in _cuda_deserialize return obj.cuda(device) File "D:\Anaconda3\envs\pytorch1_6\lib\site-packages\torch_utils.py", line 77, in _cuda return newtype(self.size()).copy(self, non_blocking) RuntimeError: CUDA error: out of memory

Yasin40 commented 3 years ago

I run export.py for export .pt to torchscript, but i got this error:

Namespace(batch_size=1, img_size=[640, 640], weights='weights/yolov5s.pt')
Fusing layers... 
Model Summary: 232 layers, 7459581 parameters, 0 gradients

Starting TorchScript export with torch 1.7.0...
./models/yolo.py:53: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/yasin/yolo/lib/python3.8/site-packages/torch/jit/_trace.py:934: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  module._c._create_method_from_trace(
TorchScript export success, saved as weights/yolov5s.torchscript.pt

and using this saved torchscript on my c++ program using libtorch, produce some error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/yolo.py", line 45, in forward
    _35 = (_4).forward(_34, )
    _36 = (_2).forward((_3).forward(_35, ), _29, )
    _37 = (_0).forward(_33, _35, (_1).forward(_36, ), )
           ~~~~~~~~~~~ <--- HERE
    _38, _39, _40, _41, = _37
    return (_41, [_38, _39, _40])
  File "code/__torch__/models/yolo.py", line 75, in forward
    _52 = torch.sub(_51, CONSTANTS.c3, alpha=1)
    _53 = torch.to(CONSTANTS.c4, dtype=6, layout=0, device=torch.device("cpu"), pin_memory=None, non_blocking=False, copy=False, memory_format=None)
    _54 = torch.mul(torch.add(_52, _53, alpha=1), torch.select(CONSTANTS.c5, 0, 0))
                    ~~~~~~~~~ <--- HERE
    _55 = torch.slice(y, 4, 0, 2, 1)
    _56 = torch.expand(torch.view(_54, [3, 80, 80, 2]), [1, 3, 80, 80, 2], implicit=True)

Traceback of TorchScript, original code (most recent call last):
./models/yolo.py(57): forward
/home/yasin/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
/home/yasin/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
./models/yolo.py(137): forward_once
./models/yolo.py(121): forward
/home/yasin/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
/home/yasin/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
/home/yasin/yolo/lib/python3.8/site-packages/torch/jit/_trace.py(934): trace_module
/home/yasin/yolo/lib/python3.8/site-packages/torch/jit/_trace.py(733): trace
models/export.py(57): <module>
RuntimeError: The size of tensor a (48) must match the size of tensor b (80) at non-singleton dimension 2

anyone can help me?

HoangTienDuc commented 3 years ago

i already export yolov5l model to onnx dynamic batching. But when i run my onnx model, it doesnot use gpu, it only use cpu. Can any one help me to solve this problem?

Yasin40 commented 3 years ago

I have got some error with this export.py script. Please Guide me. https://github.com/ultralytics/yolov5/issues/1554#issue-753020980

smartinellimarco commented 3 years ago

i already export yolov5l model to onnx dynamic batching. But when i run my onnx model, it doesnot use gpu, it only use cpu. Can any one help me to solve this problem?

Use onnxruntime-gpu maybe?

HoangTienDuc commented 3 years ago

@MarcoCBA i have tried onnxruntime-gpu many time. I think it is not easy. see #1559

i already export yolov5l model to onnx dynamic batching. But when i run my onnx model, it doesnot use gpu, it only use cpu. Can any one help me to solve this problem?

Use onnxruntime-gpu maybe?

agorskih commented 3 years ago

I get an error both in Ubuntu and OS X when exporting: CoreML export failure: unexpected number of inputs for node x.2 (_convolution): 13 What part of code should I modify to make CoreML export work?

FantasyJXF commented 3 years ago

@nobody-cheng, @luvwinnie , dynamic batch size is working for you?. I tried doing the same.I used batch_size=16 while exporting, trying to infer on batch_size=32. But I am getting the following error at the time of inference.

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_931' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:43 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector<long int>&) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false.
The input tensor cannot be reshaped to the requested shape. Input shape:{32,3,64,64,2}, requested shape:{16,3,64,64,2}

for me personally is work

def inference(devices, img_paths, index, batchsize):
    os.environ['CUDA_VISIBLE_DEVICES'] = str(devices)
    features = list()
    ort_sess = onnxruntime.InferenceSession(args.model_path)
    input_name = ort_sess.get_inputs()[0].name
    images = list()
    for img_path in img_paths:
        image = preprocess(img_path, args.height, args.width)
        if len(images) < batchsize:
            images.append(image)
            continue
        else:
            intput_image = np.concatenate(images)
            feat = ort_sess.run(None, {input_name: intput_image})[0]
            feat = normalize(feat, axis=1)
            for i in feat:
                features.append(i)
            images.clear()

That's really a FAKE batch inference, your model can only accept specific batch, if the batch is 16, but you have only 1 image, you would have to append 15 images to the input_tensor, which is not necessary.

zhiqwang commented 3 years ago

One method to make yolov5 support dynamic batch inference is throught the autoshape function, with some modification it can also be exported to ONNX.

https://github.com/ultralytics/yolov5/blob/84f9bb5d92dd8ae453df3c712d2092344d29ad90/models/common.py#L120

MaddyThakker commented 3 years ago

@agorskih any updates? I am facing a similar error.

hfzarslan commented 3 years ago

I am getting this error while converting into ONNX coremltools = 4.0 ONNX = 1.7.0

`Adding op '178' of type const Converting op 179 : listconstruct Adding op '179' of type const Converting op x.2 : _convolution Converting Frontend ==> MIL Ops: 2%|▋ | 23/932 [00:00<00:00, 983.84 ops/s] CoreML export failure: unexpected number of inputs for node x.2 (_convolution): 13

Export complete (9.95s). Visualize with https://github.com/lutzroeder/netron. (yolo) arslan@MacBook-Pro yolov5 % `

jayer95 commented 3 years ago

@hfzarslan Please use torch 1.60, it's work pip uninstall torch pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

ROBYER1 commented 3 years ago

Does this still work with Yolo v3 tiny?

lujiazho commented 3 years ago

I get an error both in Ubuntu and OS X when exporting: CoreML export failure: unexpected number of inputs for node x.2 (_convolution): 13 What part of code should I modify to make CoreML export work?

there maybe a problem with your version of torch

lengxia commented 3 years ago

onnx pass ，but onnx-sim Segment error

luvwinnie commented 3 years ago

Does anyone try to support variable size for torchscript model? Currently trying to make the output variable input and output for nvidia's triton inference server torchscript model usage. I see the code seems like it use the torch.jit.trace to trace the outputs. It seems like we can use the torch.jit.script for variable input/outputs for torchscript model, however it shows the below errors. Does anyone can help?

TorchScript export failure:
Tried to access nonexistent attribute or method '__add__' of type '__torch__.utils.activations.Hardswish'. Did you forget to initialize an attribute in __init__()?:
  File "./utils/activations.py", line 17
    def forward(x):
        # return x * F.hardsigmoid(x)  # for torchscript and CoreML
        return x * F.hardtanh(x + 3, 0., 6.) / 6.  # for torchscript, CoreML and ONNX
                              ~~~~~ <--- HERE

zhiqwang commented 3 years ago

Hi @luvwinnie , I've supported dynamic batch inference with torchscript (torch.jit.script) and onnxruntime in my own repo, maybe you could refer to this.

luvwinnie commented 3 years ago

@zhiqwang Thank you so much! i would like to test with your repo! On Triton Inference Server(TRT Server), however I'm facing a fixed size model problem with TRT Server, I have open an issue on triton-inference-server github, so that we can solve the deployment problem for yolov5!

zhiqwang commented 3 years ago

Hi @luvwinnie , I didn't test it on TRT Server. My yolov5rt follows the structure of torchvision's faster-rcnn and retinanet, if you could deploy torchvision's models successfully, yolov5rt should also be done. And I will trace the issue you mention and check anything I could do here.

luvwinnie commented 3 years ago

@zhiqwang Thank you so much. Currently I'm trying to export a torchscript model with your codes, Do you have an example for custom model? I modify your yolov5s.yaml nc parameter to my model's nc , and change your trace_model.py script to following.

model = yolov5s(pretrained=False)
model.eval()
model = model.load_state_dict(torch.load("custom.pt", map_location="cpu"))
traced_model = torch.jit.script(model)
traced_model.save("./yolov5s.torchscript.pt")

however it shows the following errors.

AttributeError: Can't get attribute 'Model' on <module 'models.yolo' from '/Users/test_user/yolov5-rt-stack/models/yolo.py'>

zhiqwang commented 3 years ago

@luvwinnie Sure, there is minor difference comparing ultralytics's yolov5 to my yolov5rt, here is a guide to convert ultralytics/yolov5 to yolov5rt.

luvwinnie commented 3 years ago

@zhiqwang it seems I have some errors with your repos with my models. @@ Seems like it can't convert the weights properly.

RuntimeError: The size of tensor a (32) must match the size of tensor b (64) at non-singleton dimension 0

zhiqwang commented 3 years ago

@luvwinnie , Is it convenient for you to send me your model weights, so that I could reproduce this bug, my email address is me@zhiqwang.com .

luvwinnie commented 3 years ago

Environments

CUDA: 10.2 CUDNN: 7.6.5 Triton Inference Server(Docker): nvcr.io/nvidia/tritonserver:20.12-py3

Folder Structure

The name of model file name must be model.onnx and the config file name must be config.pbtxt

models/
└── model_onnx
    ├── 1
    │   └── model.onnx
    └── config.pbtxt

Docker

Run with following command.

$ docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v`pwd`/models:/models nvcr.io/nvidia/tritonserver:20.12-py3 tritonserver --model-repository=/models --strict-model-config=false --log-verbose 5

Python Packages

torch==1.6.0 torchvision==0.7.0 onnx==1.7.0 onnxruntime-gpu==1.4.0 tritonclient==2.6.0
For triton inference client package can be install with the following command.

$pip install tritonclient[all]

For people who want to export onnx model with variable input length and variable outputs for Triton Inference Server you can export the model by changing the export.py following lines for exporting GPU FP16 ONNX model.

   # export.py
    ...
    model.model[-1].export = True # set Detect() layer export=True
    model.cuda()
    model.half()
    y = model(img.cuda().half())  # dry run
    # y = model(img)  # dry run
   # [print(x.shape) for x in y]

    # TorchScript export
    ...
    # ONNX export
    try:
        import onnx

        print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
        f = opt.weights.replace('.pt', '.onnx')  # filename
        torch.onnx.export(model, img.cuda().half(), f, verbose=False, opset_version=12, input_names=['input'],
                          output_names=['head1', 'head2',"head3"],dynamic_axes={
                              'input':{0:"batch_size",2:"height",3:"width"},
                              'head1':{0:"batch_size",2:"a",3:"b",4:"c"},
                              'head2':{0:"batch_size",2:"a",3:"b",4:"c"},
                              'head3':{0:"batch_size",2:"a",3:"b",4:"c"}
                          })
     ...

And you can use the model on Triton Inference Server with this config file. For more about configpb.txt check the Triton Inference Server Documentation

# configpb.txt
name: "model_onnx"
platform: "onnxruntime_onnx"
max_batch_size: 8

input {
    name: "input"
    data_type: TYPE_FP16
    format: FORMAT_NCHW
    dims: [3,-1,-1]
}
output [
    {name: "head1"
     data_type: TYPE_FP16
     dims: [3,-1,-1,-1]
    },
    {name: "head2"
    data_type: TYPE_FP16
    dims: [3,-1,-1,-1]
    },
    {name: "head3"
    data_type: TYPE_FP16
    dims: [3,-1,-1,-1]
    }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]

Example Triton inference server Client which need the NMS post-processing in this issue demo.zip which including the demo_onnx.py. In this demo_onnx.py, it has a wrong anchors . You need to modify the anchors orders with the following. This exported onnx model(variable inputs and variable outputs) has been tested on this demo_onnx.py which suppose to be work expectedly.

# demo_onnx.py
...
# anchors = [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]]  # 5s <-- remove this 
    anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]] # modify to this.
...

# trt_inference_client.py
import tritonclient.http as httpclient
from PIL import Image

image_src = Image.open("0.png")
resized = letterbox_image(image_src, (img_size_w, img_size_h)) # letterbox_image from demo_onnx.py
img_in = np.transpose(resized, (2, 0, 1)).astype(np.float16)  # HWC -> CHW
img_in = np.expand_dims(img_in, axis=0)
img_in /= 255.0

triton_client = httpclient.InferenceServerClient(
        url="localhost:8000", verbose=False)
input_name = 'input'
model_name="model_onnx"
model_version="1"
model_metadata = triton_client.get_model_metadata(
        model_name=model_name, model_version=model_version)
model_config = triton_client.get_model_config(
    model_name=model_name, model_version=model_version)

print(model_config)
inputs = []
inputs.append(httpclient.InferInput(input_name, predict_image.shape, 'FP16'))
inputs[0].set_data_from_numpy(img_in, binary_data=True) #binary_data need to be True when using FP16
outputs = []
outputs.append(httpclient.InferRequestedOutput("head1"))
outputs.append(httpclient.InferRequestedOutput("head2"))
outputs.append(httpclient.InferRequestedOutput("head3"))

response = triton_client.infer(model_name,inputs=inputs, outputs=outputs)
print(response.as_numpy("head1").shape)
print(response.as_numpy("head2").shape)
print(response.as_numpy("head3").shape)

If I have time I would like to make a PR for adding usage on Triton Inference Server

leeyunhome commented 3 years ago

Thank you so much! I will deploy onnx model on mobile devices!

Hello,

Can I see the code for uploading the model converted to .onnx to the mobile device?

Did you use onnx runtime?

Thank you.

luvwinnie commented 3 years ago

Hello everyone! For ONNX model, it seems like for example it will output the following shape for (320,320) image. I'm trying to reproduce the exactly same result in detect.py. Does anyone can help me understanding how is the Detect() layer doing the preprocessing on these three heads?

out_shape: torch.Size([1, 3, 40, 40, 6])
out_shape: torch.Size([1, 3, 20, 20, 6])
out_shape: torch.Size([1, 3, 10, 10, 6])

Let's say that for image with (1,192, 320,3), it outputs a ([1, 3780, 6]) on the Detect() layer. How to do postprocress the three heads to become the correct ([1, 3780, 6])?

FahriBilici commented 3 years ago

is it working with onnx.js>

zhiqwang commented 3 years ago

is it working with onnx.js>

Hi @FahriBilici , it could be working with onnx.js, but I didn't find a good example :(

Xuan-1998 commented 3 years ago

When I tried to run: python models/export.py --weights yolov5s.pt --img 640 --batch 1 # export at 640x640 with batch size 1 I got the following message: AttributeError: Can't get attribute 'SiLU' on <module 'torch.nn.modules.activation' from '/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/activation.py'>

How to solve it?

Xuan-1998 commented 3 years ago

Converting Frontend ==> MIL Ops: 3%|▏ | 21/620 [00:00<00:00, 796.96 ops/s] CoreML export failure: unexpected number of inputs for node x.2 (_convolution): 13 Does Anyone know how to solve this problem? When I am using the following code to export a model: python models/export.py --weights yolov5s.pt --img 640 --batch 1 # export at 640x640 with batch size 1

Olalaye commented 3 years ago

@glenn-jocher I have a question. Can you help me to explain it Why is there twice as much memory storage after the model expert

zhiqwang commented 3 years ago

Hi @Olalaye

The model are saved to half precision as default, so their storage is half of the others.

ultralytics / yolov5