grimoire / mmdetection-to-tensorrt

convert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.
Apache License 2.0
587 stars 85 forks source link

MASKRCNN Encountered known unsupported method torch.Tensor.new_tensor #105

Open azuryl opened 2 years ago

azuryl commented 2 years ago

Describe the bug A clear and concise description of what the bug is. Encountered known unsupported method torch.Tensor.new_tensor To Reproduce Code snippet about how to reproduce the bug. https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn

pretrained model R-50-FPN | pytorch | 3x | 4.1-- | -- | -- | --

import torch from mmdet.apis import init_detector,inference_detector from mmdet2trt import mmdet2trt from mmdet2trt.apis import create_wrap_detector,inference_trt_model

opt_shape_param=[ [ [1,3,320,320], # min shape [1,3,800,1344], # optimize shape [1,3,1344,1344], # max shape ] ]

cfg_path = "/home/azuryl/data2/mmdetection/configs/mask_rcnn/mask_rcnn_r50_fpn_mstrain-poly_3x_coco.py" weight_path = "/home/azuryl/data2/mmdetection-to-tensorrt/models/mask_rcnn_r50_fpn_mstrain-poly_3x_coco_20210524_201154-21b550bb.pth" save_model_path = "/home/azuryl/data2/mmdetection-to-tensorrt/models/mask_rcnn_r50_fpn_mstrain-poly_3x_coco32.trt" save_engine_path = "/home/azuryl/data2/mmdetection-to-tensorrt/models/mask_rcnn_r50_fpn_mstrain-poly_3x_coco32.engine" device = 'cuda:0' score_thr= 0.1

max_workspace_size=1<<30 # some module and tactic need large workspace. trt_model = mmdet2trt(cfg_path, weight_path, fp16_mode=False,device= 'cuda:0')#opt_shape_param=opt_shape_param,max_workspace_size=max_workspace_size

torch.save(trt_model.state_dict(), save_model_path) with open(save_engine_path, mode='wb') as f: f.write(trt_model.state_dict()['engine'])

environment: Please provide your environment by: ununtu 18.04

python tools/collect_env.py

python 3.8

Please paste the result here. module mmdet.models.backbones.CSPDarknet not exist. module mmdet.models.dense_heads.YOLOXHead not exist. module mmdet.models.necks.YOLOXPAFPN not exist. module mmdet.models.YOLOX not exist. ##############configs/commm/3x_coco_instance.py /home/jliu/coco/ mmdet inference.py mmcv.Config.fromfile /home/jliu/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmdet/core/anchor/builder.py:15: UserWarning: build_anchor_generator would be deprecated soon, please use build_prior_generator warnings.warn( Use load_from_local loader

mmdet inference.py Class names are saved in the checkpoint.............. ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')

/home/jliu/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:359: UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors warnings.warn( [01/28/2022-01:23:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +286, GPU +0, now: CPU 2432, GPU 1764 (MiB) [01/28/2022-01:23:26] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 2432 MiB, GPU 1764 MiB [01/28/2022-01:23:26] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 2453 MiB, GPU 1764 MiB Warning: Encountered known unsupported method torch.Tensor.new_tensor Warning: Encountered known unsupported method torch.Tensor.new_tensor [01/28/2022-01:23:27] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1135) [ElementWise]_output and (Unnamed Layer 1139) [Shuffle]_output: first input has type Float but second input has type Int32. [01/28/2022-01:23:27] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1144) [ElementWise]_output and (Unnamed Layer 1148) [Shuffle]_output: first input has type Float but second input has type Int32. [01/28/2022-01:23:27] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1153) [ElementWise]_output and (Unnamed Layer 1157) [Shuffle]_output: first input has type Float but second input has type Int32. [01/28/2022-01:23:27] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1162) [ElementWise]_output and (Unnamed Layer 1166) [Shuffle]_output: first input has type Float but second input has type Int32. [01/28/2022-01:23:27] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/28/2022-01:23:27] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2726, GPU 1422 (MiB) [01/28/2022-01:23:27] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [01/28/2022-01:24:15] [TRT] [I] Detected 1 inputs and 4 output network tensors. [01/28/2022-01:24:15] [TRT] [I] Total Host Persistent Memory: 199504 [01/28/2022-01:24:15] [TRT] [I] Total Device Persistent Memory: 178691072 [01/28/2022-01:24:15] [TRT] [I] Total Scratch Memory: 5185792 [01/28/2022-01:24:15] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 140 MiB, GPU 384 MiB [01/28/2022-01:24:15] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 237.453ms to assign 34 blocks to 250 nodes requiring 462737927 bytes. [01/28/2022-01:24:15] [TRT] [I] Total Activation Memory: 462737927 [01/28/2022-01:24:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/28/2022-01:24:15] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2742, GPU 1686 (MiB) [01/28/2022-01:24:15] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +91, GPU +256, now: CPU 91, GPU 256 (MiB) [01/28/2022-01:24:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/28/2022-01:24:15] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2741, GPU 1678 (MiB) [01/28/2022-01:24:15] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +611, now: CPU 91, GPU 867 (MiB)

Additional context Add any other context about the problem here.

grimoire commented 2 years ago

Hi, Which mmdet version are you using? Not that the master branch of this repo only support mmdet1.18+.

You can check your environment with collect_env.py.

If you insist to use old mmdet, you can switch to an old release.

azuryl commented 2 years ago

Hi, Which mmdet version are you using? Not that the master branch of this repo only support mmdet1.18+.

You can check your environment with collect_env.py.

If you insist to use old mmdet, you can switch to an old release.

@grimoire Dear grimoire It seems mmdet =2.14.0 in requirement "mim install mmdet==2.14.0" https://github.com/grimoire/mmdetection-to-tensorrt

Collecting environment information... PyTorch version: 1.10.1 Is debug build: False CUDA used to build PyTorch: 10.2 OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.20.0 Libc version: glibc-2.27 Python version: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) [GCC 7.3.0] (64-bit runtime) Python platform: Linux-4.15.0-163-generic-x86_64-with-glibc2.10 Is CUDA available: True CUDA runtime version: 10.2.89 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: Quadro P620

Nvidia driver version: 440.100 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 Versions of relevant libraries: [pip3] mmcv-full==1.3.9 [pip3] mmdet==2.14.0 [pip3] mmdet2trt==0.5.0 [pip3] tensorrt==8.2.2.1 [pip3] torch==1.10.1 [pip3] torch2trt-dynamic==0.5.0 [pip3] torchaudio==0.10.1 [pip3] torchvision==0.11.2 [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mmcv-full 1.3.9 pypi_0 pypi [conda] mmdet 2.14.0 pypi_0 pypi [conda] mmdet2trt 0.5.0 dev_0 [conda] pytorch 1.10.1 py3.8_cuda10.2_cudnn7.6.5_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] tensorrt 8.2.2.1 pypi_0 pypi [conda] torch2trt-dynamic 0.5.0 dev_0 [conda] torchaudio 0.10.1 py38_cu102 pytorch [conda] torchvision 0.11.2 py38_cu102 pytorch

grimoire commented 2 years ago

Oh, err... sorry, master branch requires mmdet 2.18.0+ , I forgot to update the Readme.md. Would you mind updating mmdetection and trying again?

azuryl commented 2 years ago

Oh, err... sorry, master branch requires mmdet 2.18.0+ , I forgot to update the Readme.md. Would you mind updating mmdetection and trying again?

ok I will try it

azuryl commented 2 years ago

Oh, err... sorry, master branch requires mmdet 2.18.0+ , I forgot to update the Readme.md. Would you mind updating mmdetection and trying again?

@grimoire Dear grimire I have update mmdetection to 2.18.0 but the outputs of TRT and torch model outputs is still all zero

python tools/collect_env.py Collecting environment information... PyTorch version: 1.10.1 Is debug build: False CUDA used to build PyTorch: 10.2 OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.20.0 Libc version: glibc-2.27 Python version: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) [GCC 7.3.0] (64-bit runtime) Python platform: Linux-4.15.0-163-generic-x86_64-with-glibc2.10 Is CUDA available: True CUDA runtime version: 10.2.89 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: Quadro P620

Nvidia driver version: 440.100 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 Versions of relevant libraries: [pip3] mmcv-full==1.3.9 [pip3] mmdet==2.18.0 [pip3] mmdet2trt==0.5.0 [pip3] tensorrt==8.2.2.1 [pip3] torch==1.10.1 [pip3] torch2trt-dynamic==0.5.0 [pip3] torchaudio==0.10.1 [pip3] torchvision==0.11.2 [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mmcv-full 1.3.9 pypi_0 pypi [conda] mmdet 2.18.0 pypi_0 pypi [conda] mmdet2trt 0.5.0 dev_0 [conda] pytorch 1.10.1 py3.8_cuda10.2_cudnn7.6.5_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] tensorrt 8.2.2.1 pypi_0 pypi [conda] torch2trt-dynamic 0.5.0 dev_0 [conda] torchaudio 0.10.1 py38_cu102 pytorch [conda] torchvision 0.11.2 py38_cu102 pytorch

opt_shape_param=[ [ [1,3,320,320], # min shape [1,3,1280,1280], # optimize shape [1,3,1344,1344], # max shape ] ] max_workspace_size=1<<30 # some module need large workspace, add workspace size when OOM. trt_model, torch_model = mmdet2trt(cfg_path, weight_path , opt_shape_param=opt_shape_param, fp16_mode=False, max_workspace_size=1<<30, log_level=logging.debug, return_warp_model=True) x = torch.ones([1,3,320,320]) x = x.cuda() y1 = trt_model(x) y2 = torch_model(x)

Use load_from_local loader /home/azuryl/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmdet/models/dense_heads/anchor_head.py:123: UserWarning: DeprecationWarning: anchor_generator is deprecated, please use "prior_generator" instead warnings.warn('DeprecationWarning: anchor_generator is deprecated, ' /home/azuryl/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:369: UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors warnings.warn( [01/30/2022-00:36:20] [TRT] [I] [MemUsageChange] Init CUDA: CPU +283, GPU +0, now: CPU 2432, GPU 1764 (MiB) [01/30/2022-00:36:20] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 2432 MiB, GPU 1764 MiB [01/30/2022-00:36:20] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 2451 MiB, GPU 1764 MiB Warning: Encountered known unsupported method torch.Tensor.new_tensor Warning: Encountered known unsupported method torch.Tensor.new_tensor [01/30/2022-00:36:21] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1135) [ElementWise]_output and (Unnamed Layer 1139) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-00:36:21] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1144) [ElementWise]_output and (Unnamed Layer 1148) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-00:36:21] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1153) [ElementWise]_output and (Unnamed Layer 1157) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-00:36:21] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1162) [ElementWise]_output and (Unnamed Layer 1166) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-00:36:21] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/30/2022-00:36:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2726, GPU 1422 (MiB) [01/30/2022-00:36:21] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [01/30/2022-00:37:09] [TRT] [I] Detected 1 inputs and 4 output network tensors. [01/30/2022-00:37:09] [TRT] [I] Total Host Persistent Memory: 197968 [01/30/2022-00:37:09] [TRT] [I] Total Device Persistent Memory: 178572288 [01/30/2022-00:37:09] [TRT] [I] Total Scratch Memory: 5185792 [01/30/2022-00:37:09] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 140 MiB, GPU 384 MiB [01/30/2022-00:37:09] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 237.174ms to assign 34 blocks to 251 nodes requiring 446192135 bytes. [01/30/2022-00:37:09] [TRT] [I] Total Activation Memory: 446192135 [01/30/2022-00:37:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/30/2022-00:37:09] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2743, GPU 1686 (MiB) [01/30/2022-00:37:09] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +91, GPU +256, now: CPU 91, GPU 256 (MiB) [01/30/2022-00:37:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/30/2022-00:37:09] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2742, GPU 1678 (MiB) [01/30/2022-00:37:09] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +595, now: CPU 91, GPU 851 (MiB) ##############torch2trt_dynamic.py inital TRTModule input_names,output_names: ['input_0'] ['num_detections', 'boxes', 'scores', 'classes'] trt_model: (tensor([[0]], device='cuda:0', dtype=torch.int32), tensor([[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]], device='cuda:0'), tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0'), tensor([[-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]], device='cuda:0')) torch_model: [tensor([[0]], device='cuda:0', dtype=torch.int32), tensor([[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]], device='cuda:0'), tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0'), tensor([[-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,

grimoire commented 2 years ago

You need to enable the flag to convert a model with instance segmentation. please read this for more detail. And as your input is x = torch.ones([1,3,320,320]), I think the result is expected.

azuryl commented 2 years ago

You need to enable the flag to convert a model with instance segmentation. please read this for more detail. And as your input is x = torch.ones([1,3,320,320]), I think the result is expected.

@grimoire Dear grimoire you means enable_mask = True? it seems results is still all zero max_workspace_size=1<<30 # some module and tactic need large workspace. trt_model, torch_model = mmdet2trt(cfg_path, weight_path, opt_shape_param=opt_shape_param, fp16_mode=False, int8_mode=False, device= 'cuda:0', trt_log_level='INFO', max_workspace_size=1<<30, log_level=logging.debug, return_wrap_model=True, enable_mask = True, output_names=['num_detections', 'boxes', 'scores', 'classes'])#opt_shape_param=opt_shape_param,max_workspace_size=max_workspace_size x = torch.ones([1,3,320,320]) x = x.cuda()

y1 = trt_model(x) y2 = torch_model(x) print("trt_model:\n",y1) print("torch_model:\n",y2)

Use load_from_local loader mask mode require len(output_names)==5 but get output_names=['num_detections', 'boxes', 'scores', 'classes'] /home/jliu/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmdet/models/dense_heads/anchor_head.py:123: UserWarning: DeprecationWarning: anchor_generator is deprecated, please use "prior_generator" instead warnings.warn('DeprecationWarning: anchor_generator is deprecated, ' /home/jliu/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:369: UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors warnings.warn( [01/30/2022-01:40:10] [TRT] [I] [MemUsageChange] Init CUDA: CPU +285, GPU +0, now: CPU 2432, GPU 1900 (MiB) [01/30/2022-01:40:10] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 2432 MiB, GPU 1900 MiB [01/30/2022-01:40:10] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 2451 MiB, GPU 1900 MiB Warning: Encountered known unsupported method torch.Tensor.new_tensor Warning: Encountered known unsupported method torch.Tensor.new_tensor [01/30/2022-01:40:11] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1135) [ElementWise]_output and (Unnamed Layer 1139) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-01:40:11] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1144) [ElementWise]_output and (Unnamed Layer 1148) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-01:40:11] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1153) [ElementWise]_output and (Unnamed Layer 1157) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-01:40:11] [TRT] [W] IElementWiseLayer with inputs (Unnamed Layer 1162) [ElementWise]_output and (Unnamed Layer 1166) [Shuffle]_output: first input has type Float but second input has type Int32. [01/30/2022-01:40:12] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/30/2022-01:40:12] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2737, GPU 1424 (MiB) [01/30/2022-01:40:12] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [01/30/2022-01:41:08] [TRT] [I] Detected 1 inputs and 5 output network tensors. [01/30/2022-01:41:08] [TRT] [I] Total Host Persistent Memory: 211648 [01/30/2022-01:41:08] [TRT] [I] Total Device Persistent Memory: 188109312 [01/30/2022-01:41:08] [TRT] [I] Total Scratch Memory: 100352000 [01/30/2022-01:41:08] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 140 MiB, GPU 384 MiB [01/30/2022-01:41:09] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 298.77ms to assign 34 blocks to 280 nodes requiring 424986119 bytes. [01/30/2022-01:41:09] [TRT] [I] Total Activation Memory: 424986119 [01/30/2022-01:41:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/30/2022-01:41:09] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2754, GPU 1688 (MiB) [01/30/2022-01:41:09] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +91, GPU +256, now: CPU 91, GPU 256 (MiB) [01/30/2022-01:41:09] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [01/30/2022-01:41:09] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2753, GPU 1680 (MiB) [01/30/2022-01:41:09] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +584, now: CPU 91, GPU 840 (MiB) ##############torch2trt_dynamic.py inital TRTModule input_names,output_names: ['input_0'] ['output_0', 'output_1', 'output_2', 'output_3', 'output_4'] trt_model: (tensor([[0]], device='cuda:0', dtype=torch.int32), tensor([[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]], device='cuda:0'), tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0'), tensor([[-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]], device='cuda:0'), tensor([[[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     ...,

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]]]], device='cuda:0'))

torch_model: [tensor([[0]], device='cuda:0', dtype=torch.int32), tensor([[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]], device='cuda:0'), tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0'), tensor([[-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]], device='cuda:0'), tensor([[[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     ...,

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]]]], device='cuda:0',
   grad_fn=<SqueezeBackward1>)]