TensorRT EP's inference results are abnormal.

c1aude commented 1 month ago

Describe the issue

Inference results are outputting abnormally when using YOLOv7 models with TensorRT EP.

We have confirmed that the results are normal when using CPU and CUDA.

The issue was reproducible in versions 1.18.0 to 1.18.1 using TensorRT 10, and did not occur in versions 1.17.3 and earlier using TensorRT 8.6.1.6.

When using TensorRT 10, are there any other actions required when converting pytorch models to onnx as opposed to using TensorRT8?

Tensor RT result:

CPU or CUDA Result:

To reproduce

The code we used for testing is shown below.


import numpy as np
import onnxruntime as ort
import cv2
import matplotlib.pyplot as plt
import torch

class YOLOv7:

    def __init__(self, onnx_model, input_image, confidence_thresh, iou_thresh):
        self.onnx_model = onnx_model
        self.input_image = input_image
        self.confidence_thresh = confidence_thresh
        self.iou_thresh = iou_thresh
        self.classes = ["0","1", "2", "3", "4", "5", "6", "7", "8", "9",
                        "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
                        "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
                        "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
                        "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
                        "#", ".", "_", "carve", "print", "*", "+", "cut", "double", "copper",
                        ",", "/", "(", ")", "&", ":", "_", "~", "%", "=", "<", ">"]
        self.color_palette = np.random.uniform(0, 255, size = (len(self.classes),3))

    def draw_detections(self, img, box, score, class_id):
        x1, y1, w, h = box

        color = self.color_palette[class_id]
        cv2.rectangle(img,(int(x1), int(y1)), (int(x1+w), int(y1+h)), color, 2)
        label = self.classes[class_id]
        (label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
        label_x = x1
        label_y = y1 - 10 if y1 -10 > label_height else y1 + 10
        cv2.rectangle(
            img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color, cv2.FILLED
        )
        cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)

    def preprocess(self):
        self.img = cv2.imread(self.input_image)
        self.img_height, self.img_width = self.img.shape[:2]

        img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (self.input_width, self.input_height))
        image_data = np.array(img) / 255.0
        image_data = np.transpose(image_data, (2, 0 ,1))
        image_data = np.expand_dims(image_data, axis=0).astype(np.float32)

        return image_data

    def postprocess(self, input_image, output):
        output = output[0]
        print(output.shape)
        rows = output.shape[0]
        boxes = []
        scores = []
        class_ids = []

        x_factor = self.img_width / self.input_width
        y_factor = self.img_height / self.input_height
        print(output[0])
        for i in range(rows):
            classes_scores = output[i][6]

            if classes_scores >= self.confidence_thresh:
                class_id = output[i][5]

                x1, y1, x2, y2 = output[i][1], output[i][2], output[i][3], output[i][4]

                left = int(x1 * x_factor)
                top = int(y1 * y_factor)
                width = int((x2 - x1) * x_factor)
                height = int((y2 - y1) * y_factor)

                class_ids.append(class_id)
                scores.append(classes_scores)
                boxes.append([left, top, width, height])

        indices = cv2.dnn.NMSBoxes(boxes, scores, self.confidence_thresh, self.iou_thresh)

        for i in indices:
            box = boxes[i]
            score = scores[i]
            class_id = int(class_ids[i])
            self.draw_detections(input_image, box, score, class_id)
        return input_image

    def inference(self):
        providers = [
            ('CPUExecutionProvider', {
            'intra_op_num_threads': 4,  # 단일 연산에 사용할 최대 스레드 수
            'inter_op_num_threads': 1   # 여러 연산 간에 사용할 최대 스레드 수
            }),

            # ('CUDAExecutionProvider', {
            # 'device_id': 0,            # 사용할 GPU ID
            # 'arena_extend_strategy': 'kNextPowerOfTwo',
            # 'gpu_mem_limit': 2 * 1024 * 1024 * 1024,  # GPU 메모리 제한 (2GB)
            # 'cudnn_conv_algo_search': 'EXHAUSTIVE',
            # 'do_copy_in_default_stream': True,
            # 'enable_cuda_graph': False,  # CUDA graph 최적화 비활성화
            # }),

            # ('TensorrtExecutionProvider', {
            #     'device_id': 0,            # 사용할 GPU ID
            #     'trt_max_partition_iterations': 10, # 최적화를 위한 반복 횟수
            #     'trt_max_workspace_size': 2 * 1024 * 1024 * 1024,  # GPU 메모리 제한 (2GB)
            #     'trt_min_subgraph_size': 1, # 최소 생성 서브 그래프 개수
            #     'trt_engine_cache_enable': False,   # 캐시 저장 여부
            #     'trt_fp16_enable': True  # FP16 활성화
            #     #'trt_int8_enable': True   # Int8 활성화
            #     })
            ]       
        session = ort.InferenceSession(self.onnx_model, providers=providers)
        model_inputs = session.get_inputs()

        input_shape = model_inputs[0].shape
        self.input_width = input_shape[2]
        self.input_height = input_shape[3]

        img_data = self.preprocess()
        outputs = session.run(None, {model_inputs[0].name: img_data})
        return self.postprocess(self.img, outputs)

detection = YOLOv7(r"E:\YOLOv7\best_562_0712.onnx", r"E:\TEST\test.png", 0.9, 0.5)
output_image = detection.inference()

plt.imshow(output_image)

Urgency

No response

Platform

Windows

OS Version

Windows 11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8, Cudnn 8.9.7, TensorRT 10.2.0.19

yf711 commented 1 month ago

Hi @c1aude could you also share the model/the requirements.txt of your python env/test image to repro this issue?

c1aude commented 1 month ago

Hi @c1aude could you also share the model/the requirements.txt of your python env/test image to repro this issue?

Hello @yf711 The env and model files used for inference.

image used for inference : test

env requirements.txt : https://drive.google.com/file/d/16wj3sa0JFyBOTpPit_2iqBU0trZtITSw/view?usp=sharing

test model: https://drive.google.com/file/d/12GZKzMf5Pq1_qgKiwPF9HOCBwWM_jvsz/view?usp=sharing

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

jywu-msft commented 1 week ago

adding a note to further debug this issue

microsoft / onnxruntime