Windows TensorRT Python interface compatibility

IamNaQi commented 2 years ago

🐛 Describe the bug

Hi I run your given notebook on windows for python inference on windows 10 https://github.com/zhiqwang/yolov5-rt-stack/blob/main/notebooks/onnx-graphsurgeon-inference-tensorrt.ipynb

but I could not get a better result here is code sample that I used from your given notebook I have tried with different thresh holds and but didn't try other precision as it's supports fp32 for now

batch_size = 1
img_size = 640
size_divisible = 64
fixed_shape = True
score_thresh = 0.35
nms_thresh = 0.45
detections_per_img = 100
precision = "fp32"  # Currently only supports fp32
model_path = r"C:\Users\Naqi\Downloads\yolov5n6.pt"
onnx_path = "yolov5n6.onnx"
engine_path = "yolov5n6.engine"
from yolort.runtime.trt_helper import export_tensorrt_engine
input_sample = torch.rand(batch_size, 3, img_size, img_size)
export_tensorrt_engine(
    model_path,
    score_thresh=score_thresh,
    nms_thresh=nms_thresh,
    onnx_path=onnx_path,
    engine_path=engine_path,
    input_sample=input_sample,
    detections_per_img=detections_per_img,
)

output: model saved and can show input shape

Saved ONNX model to yolov5n6.onnx
Network Description
Input 'images' with shape (8, 3, 640, 640) and dtype DataType.FLOAT
Output 'num_detections' with shape (8, 1) and dtype DataType.INT32
Output 'detection_boxes' with shape (8, 1000, 4) and dtype DataType.FLOAT
Output 'detection_scores' with shape (8, 1000) and dtype DataType.FLOAT
Output 'detection_classes' with shape (8, 1000) and dtype DataType.INT32
Building fp32 Engine in yolov5n6.engine
Using fp32 mode.
Serialize engine success, saved as yolov5n6.engine

While prediction

import torch
from yolort.runtime import PredictorTRT
import cv2 as cv
import numpy as np
from yolort.runtime import PredictorTRT
device = torch.device("cuda")
engine_path = "yolov5n6.engine"
y_runtime = PredictorTRT(engine_path, device=device)
y_runtime.warmup()
predictions_trt = y_runtime.predict(r"new_york.jpg")
predictions_trt

Error: it seems that it detect but giving empty tensors with size(0,4)

[{'scores': tensor([0.72805, 0.63373, 0.63361, 0.60388, 0.52863], device='cuda:0'),
  'labels': tensor([0, 0, 0, 0, 0], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([[1718.20227,  469.38287, 1798.75513,  772.85791],
          [ 785.19324,  310.30093, 1113.29858, 1166.43042],
          [1403.16125,  484.76923, 1505.06226,  794.00586],
          [1610.10339,  479.57529, 1727.66260,  761.01874],
          [1495.20166,  512.73260, 1584.91882,  764.76276]], device='cuda:0')},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', **size=(0, 4)**)},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', **size=(0, 4)**)},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))}]

Here is the out put image, output

please help me out I hope I explain my issue better.

Versions

PyTorch version: 1.8.2+cu111
CUDA used to build PyTorch: 11.1 We're using TensorRT: 8.4.0.6 on cuda device: 0. OS: Microsoft Windows 10 Home
CMake version: version 3.23.0

zhiqwang commented 2 years ago

Hi @IamNaQi ,

Because we use PyTorch to do the data binding in the TensorRT Python interface, this will involve pointer manipulation, and this approach may have some limitations on cross-platform.

https://github.com/zhiqwang/yolov5-rt-stack/blob/0c88e4f44646092078d5d55caed575dc2d26823d/yolort/runtime/y_tensorrt.py#L159-L160

We have verified the accuracy of the C++ example on Windows system #389 , we should add more tests and more docs for this.

IamNaQi commented 2 years ago

Thank you very much for your kind response

Accuracy of the C++ example on Windows system is working very smoothly, I have tested without copying DLLs into debug folder and working perfectly with build by new cmake list, and result is awesome. That was a mistake, I have RTX 3060 but CUDA version was installed 10.2 which is not compatible with RTX 3060. I just update it to CUDA 11.6 and build with new cmake with Visual Studio 2019.

Environment

PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.23.0
Libc version: N/A

Python version: 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 511.65
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.11.0+cu113
[pip3] torchaudio==0.11.0+cu113
[pip3] torchvision==0.12.0+cu113
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h59b6b97_2
[conda] libblas                   3.9.0              12_win64_mkl    conda-forge
[conda] libcblas                  3.9.0              12_win64_mkl    conda-forge
[conda] liblapack                 3.9.0              12_win64_mkl    conda-forge
[conda] mkl                       2021.4.0           h0e2418a_729    conda-forge
[conda] mkl-service               2.4.0            py39h6b0492b_0    conda-forge
[conda] mkl_fft                   1.3.1            py39h0cb33c3_1    conda-forge
[conda] mkl_random                1.2.2            py39h2e25243_0    conda-forge
[conda] mypy_extensions           0.4.3            py39hcbf5309_5    conda-forge
[conda] numpy                     1.22.3                   pypi_0    pypi
[conda] numpy-base                1.20.3           py39hc2deb75_0
[conda] numpydoc                  1.2.1              pyhd8ed1ab_2    conda-forge
[conda] pytorch                   1.11.0          py3.9_cuda11.3_cudnn8_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.11.0               py39_cu113    pytorch
[conda] torchvision               0.12.0               py39_cu113    pytorch

Result of the C++ example on Windows system

result_new_yolork

Python inference still giving errors because of my environment issue. I am working on it and will update when solved

zhiqwang commented 2 years ago

The C++ inference results are perfect! And seems that you're using TensorRT EA version. EA version stands for early access (It is before actual release). GA stands for general availability. TensorRT GA is stable version and completely tested by nvidia team. So could you try to test TensorRT latest version GA release - TensorRT 8.2 GA Update 3 for x86_64 Architecture.

xinsuinizhuan commented 2 years ago

where is the ppl.nn forward?

zhiqwang commented 2 years ago

@IamNaQi , Since C++ TensorRT inference can be reproducibly verified, I guess TensorRT's python interface does not support Windows well, so I think this issue has been solved, I'll close this thread for now.

@xinsuinizhuan , Thanks for your interesets here, we don't support ppl.nn yet. We did have a pplnn branch before, but we tested that the ONNX exported by yolort did not work properly on pplnn #147. I'm not sure how well pplnn supports yolov5 (or yolort) now, and I will create a new ticket for pplnn support later to make this thread cleaner, or you can create a new one if you are convenient.

zhiqwang commented 2 years ago

As described in https://github.com/NVIDIA/TensorRT/issues/1945#issuecomment-1108325943 , TensorRT's Windows python interface has a compatibility issue with PyTorch. Reopen this ticket due to we should make yolort compatible with Windows System.

zhiqwang / yolort