shouxieai / tensorRT_Pro

C++ library based on tensorrt integration
MIT License
2.6k stars 549 forks source link

导出onnx时出现异常 #134

Closed Trytoz closed 2 years ago

Trytoz commented 2 years ago

本机环境CUDA11.3 CUDNN 8.4.1.50 TensorRT-8.4.1.5 python3.8 torch 1.12.0+cu113 torchaudio 0.12.0+cu113 torchvision 0.13.0+cu113 onnxruntime 1.12.0 onnxruntime-gpu 1.12.0 opencv-contrib-python 4.6.0.66 opencv-python 4.1.2.30 已成功通过vs 构建libpytrtc.pyd并且提供的demo正常运行与推理

--------------

我根据readme中的教程 对yolov5 6.0的代码进行修改 修改后的代码片段为

    def forward(self, x):
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            # bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            # x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            bs = -1
            ny = int(ny)
            nx = int(nx)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)

                # disconnect for pytorch trace
                anchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)

                y = x[i].sigmoid()
                if self.inplace:
                    y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    # y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh
                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                    # wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                    wh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh
                    y = torch.cat((xy, wh, y[..., 4:]), -1)
                # z.append(y.view(bs, -1, self.no))
                z.append(y.view(bs, self.na * ny * nx, self.no))

        torch.onnx.export(model, im, f, verbose=False, opset_version=opset,
                          training=torch.onnx.TrainingMode.TRAINING if train else torch.onnx.TrainingMode.EVAL,
                          do_constant_folding=not train,
                          input_names=['images'],
                          output_names=['output'],
                          dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)
                                        'output': {0: 'batch'}  # shape(1,25200,85)   # shape(1,25200,85)
                                        } if dynamic else None)

导出模型 python export.py --weights=yolov5s.pt --dynamic --include=onnx --opset=11 提示以下错误

(cuda11) PS H:\main2\tensorRT_Pro-main\yolov5-6.0> python export.py --weights=yolov5s.pt --dynamic --include=onnx --opset=11
export: data=data\coco128.yaml, weights=yolov5s.pt, imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=False, dynamic=True, simplify=False, opset=11, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5  2021-10-12 torch 1.12.0+cu113 CPU

Fusing layers...
Model Summary: 213 layers, 7225885 parameters, 0 gradients
H:\anaconda3\envs\cuda11\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

PyTorch: starting from yolov5s.pt (14.8 MB)

ONNX: starting export with onnx 1.12.0...
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:136: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if augment:
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:159: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:163: TracerWarning: Converting a tensor to a Python boolean might cant in the future. This means that the trace might not generalize to other inputs!
  if visualize:
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:163: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if visualize:
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:159: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:61: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ny = int(ny)
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:62: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  nx = int(nx)
H:\main2\tensorRT_Pro-main\yolov5-6.0\models\yolo.py:66: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
ONNX: export success, saved as yolov5s.onnx (29.1 MB)
ONNX: run --dynamic ONNX model inference with: 'python detect.py --weights yolov5s.onnx'

运行trt转换时出现以下错误

[2022-08-15 12:54:19][error][trt_builder.cpp:30]:NVInfer: H:\main2\tensorRT_Pro-main\src\tensorRT\onnx_parser\ModelImporter.cpp:736: While parsing node number 139 [Resize -> "onnx::Concat_271"]:
[2022-08-15 12:54:19][error][trt_builder.cpp:30]:NVInfer: H:\main2\tensorRT_Pro-main\src\tensorRT\onnx_parser\ModelImporter.cpp:737: --- Begin node ---
[2022-08-15 12:54:19][error][trt_builder.cpp:30]:NVInfer: H:\main2\tensorRT_Pro-main\src\tensorRT\onnx_parser\ModelImporter.cpp:738: input: "input.140"
input: ""
input: "onnx::Resize_457"
output: "onnx::Concat_271"
name: "Resize_139"
op_type: "Resize"
attribute {
  name: "coordinate_transformation_mode"
  s: "asymmetric"
  type: STRING
}
attribute {
  name: "cubic_coeff_a"
  f: -0.75
  type: FLOAT
}
attribute {
  name: "mode"
  s: "nearest"
  type: STRING
}
attribute {
  name: "nearest_mode"
  s: "floor"
  type: STRING
}

[2022-08-15 12:54:19][error][trt_builder.cpp:30]:NVInfer: H:\main2\tensorRT_Pro-main\src\tensorRT\onnx_parser\ModelImporter.cpp:739: --- End node ---
[2022-08-15 12:54:19][error][trt_builder.cpp:30]:NVInfer: H:\main2\tensorRT_Pro-main\src\tensorRT\onnx_parser\ModelImporter.cpp:742: ERROR: H:\main2\tensorRT_Pro-main\src\tensorRT\onnx_parser\builtin_op_importers.cpp:3500 In function importResize:
[8] Assertion failed: scales.is_weights() && "Resize scales must be an initializer!"
[2022-08-15 12:54:19][error][trt_builder.cpp:519]:Can not parse OnnX file: yolov5s.onnx
[2022-08-15 12:54:19][error][yolo_gpuptr.cpp:188]:Engine yolov5s2.fp32.trtmodel load failed
Traceback (most recent call last):
  File "H:/main2/tensorRT_Pro-main/example-python/test_yolov5.py", line 17, in <module>
    bboxes = yolo.commit(image).get()
BufferError: Invalid engine instance, please makesure your construct
[2022-08-15 12:54:18][info][trt_builder.cpp:474]:Compile FP32 Onnx Model 'yolov5s.onnx'.

我已经自我排查过好几次了 但是还是出现这种情况 已经焦头烂额了 希望能得到作者的解答 yolov5-6.0-edit.zip 附件为修改后的yolov5 6.0代码

shouxieai commented 2 years ago

这是因为你的pytorch版本高了,使得导出的resize节点的scales不是一个initializer。你可以执行以下代码来修改实现目的: 或者降低pytorch版本到1.9、1.10等是可以的

import onnx
import onnx.helper as helper

model = onnx.load("yolov5s.onnx")

def find_node_for_output(nodes, name):
    for i, n in enumerate(nodes):
        if name in n.output:
            return i, n
    return None, None

nodes = model.graph.node
inits = model.graph.initializer
remove_nodes = []

for i, node in enumerate(nodes):
    if node.op_type == "Resize":
        idx, identity = find_node_for_output(nodes, node.input[2])
        if identity is not None:
            remove_nodes.append(idx)
            node.input[2] = identity.input[0]

remove_nodes = sorted(remove_nodes,reverse=True)
for i in remove_nodes:
    del nodes[i]

onnx.save(model, "output.onnx")
Trytoz commented 2 years ago

感谢 降低torch版本至1.8.2之后成功导出