open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.73k stars 627 forks source link

On Jetson platform with older TRT7, output bbox and mask appears distorted. #127

Closed tehkillerbee closed 2 years ago

tehkillerbee commented 2 years ago

I have previously used mmdeploy to deploy a model to TensorRT backend using MMDeploy/tools/deploy.py and inference based on the mmdeploy SDK object_detection.cpp example.

When on the PC, the test output images generated by deploy.py contain identical detections in both pytorch and tensorrt.

However, when I repeat the steps on a Jetson platform, it does not result in identical results. Instead, the detections appear shifted and the masks are distorted. I have used identical mmdeploy configurations for both platforms and identical model input.

I am not sure where to proceed but I suspect there is an incompatibility with the version of tensorrt installed on my Jetson AGX. Before I upgrade to latest Jetpack, I would like to know if you have seen this issue before.

SingleZombie commented 2 years ago

Hi, we have tested detection models on Jetson TX2 and Xavier. We haven't seen the issue before. Could you please provide your model_cfg, deploy_cfg, deploy command as well as your TensorRT version on both platforms? We can try to reproduce this issue.

tehkillerbee commented 2 years ago

@SingleZombie Thank you. Which versions of Jetpack have you tested with? I can also test with a TX2 or a Xavier NX if necessary.

I have gotten our Jetson AGX platform upgraded to latest Jetpack 4.6 with a later TensorRT version 8.0.1.6. My initial tests show that the inference results appear correct in both pytorch and tensorrt engine. MMdeploy works as expected with the following configuration:

nvidia@ax720_01:~/git/jetsonUtilities$ ./jetsonInfo.py 
NVIDIA Jetson UNKNOWN
 L4T 32.6.1 [ JetPack 4.6 ]
   Ubuntu 18.04.6 LTS
   Kernel Version: 4.9.253-tegra
 CUDA 10.2.300
   CUDA Architecture: NONE
 OpenCV version: 4.1.1
   OpenCV Cuda: NO
 CUDNN: 8.2.1.32
 TensorRT: 8.0.1.6
 Vision Works: 1.6.0.501
 VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
 Vulcan: 1.2.70

2022-02-07 09:53:09,334 - mmdeploy - INFO - TorchVision: 0.11.1 2022-02-07 09:53:09,334 - mmdeploy - INFO - OpenCV: 4.5.5 2022-02-07 09:53:09,334 - mmdeploy - INFO - MMCV: 1.4.1 2022-02-07 09:53:09,335 - mmdeploy - INFO - MMCV Compiler: GCC 7.5 2022-02-07 09:53:09,335 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2 2022-02-07 09:53:09,335 - mmdeploy - INFO - MMDeployment: 0.1.0+7a88eae 2022-02-07 09:53:09,908 - mmdeploy - INFO - onnxruntime: None ops_is_avaliable : False 2022-02-07 09:53:09,909 - mmdeploy - INFO - tensorrt: 8.0.1.6 ops_is_avaliable : True 2022-02-07 09:53:09,912 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False 2022-02-07 09:53:09,914 - mmdeploy - INFO - pplnn_is_avaliable: False 2022-02-07 09:53:09,916 - mmdeploy - INFO - openvino_is_avaliable: False


I have used the following deploy command. Model is based on `mask_rcnn_r50` trained with custom dataset.

* deploy command

python MMDeploy/tools/deploy.py \ MMDeploy/configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py \ mmdetection/work_dirs/mask_rcnn_r50_fpn_1x_coco_768_768_scale4_n1000/mask_rcnn_r50_fpn_1x_coco_logend_batch12_768_768_scale4_n1000.py \ mmdetection/work_dirs/mask_rcnn_r50_fpn_1x_coco_768_768_scale4_n1000/latest.pth \ testdata/testimg1/imageA009.png \ --work-dir mmdeploy_workdir \ --device cuda:0 \ --dump-info

SingleZombie commented 2 years ago

Hi, I tested Mask R-CNN on our Jetson TX2:

NVIDIA Jetson TX2
 L4T 32.6.1 [ JetPack 4.6 ]
   Ubuntu 18.04.5 LTS
   Kernel Version: 4.9.253-tegra
 CUDA 10.2.300
   CUDA Architecture: 6.2
 OpenCV version: 4.1.1
   OpenCV Cuda: NO
 CUDNN: 8.2.1.32
 TensorRT: 8.0.1.6
 Vision Works: 1.6.0.501
 VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
 Vulcan: 1.2.70

I used the following command. The checkpoint is downloaded from MMDetection.

python   ~/mmdeploy/tools/deploy.py  
 ~/mmdeploy/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py 
/home/pjlab/mmdetection/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_1x_coco.py
/home/pjlab/mmdetection/checkpoints/mask_rcnn_r50_caffe_fpn_1x_coco_bbox_mAP-0.38__segm_mAP-0.344_20200504_231812-0ebd1859.pth
work_dirs/demo2.jpg
--test-img work_dirs/demo2.jpg 
--work-dir work_dirs/output/mmdet-is-trt-8-0/ 
--device cuda:0 
--log-level INFO

The results of PyTorch and TensorRT were identical. I am sorry to say I cannot reproduce the bug.

tehkillerbee commented 2 years ago

@SingleZombie I should clarify, that I get identical/correct results on Jetpack 4.6 but not on Jetpack 4.5 so that confirms your results. Both platforms use the same versions of mmdeploy and dependencies. Main difference between Jetpack 4.5/4.6 is TensorRT version so maybe this is the cause of the issue?

SingleZombie commented 2 years ago

Yes, we think the issue is caused by TensorRT 7.1. Thank you for reporting this bug. We don't plan to support TensorRT <= 7.1 now. Maybe we will support it in the future.

tehkillerbee commented 2 years ago

@SingleZombie That makes sense. In that case this issue and #114 and PR #133 can probably be closed.

tehkillerbee commented 2 years ago

Closing this issue,as TRT7 will not be supported at this point