grimoire / mmdetection-to-tensorrt

convert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.
Apache License 2.0
586 stars 85 forks source link

CUDA error: an illegal memory access was encountered #30

Open davidas1 opened 3 years ago

davidas1 commented 3 years ago

Describe the bug First of all - thank you for a great project I installed this repo on the nvcr.io/nvidia/pytorch:20.10-py3 image (together with all prerequisites).

I'm trying to convert the LVIS model from mmdetection (https://github.com/open-mmlab/mmdetection/blob/master/configs/lvis/mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1.py). The conversion process seems to go OK, but when I try to do inference I get RuntimeError: CUDA error: an illegal memory access was encountered when accessing the result tensors. I tried both with and without mask support.

Please let me know if I'm doing something wrong.. Thanks again!

To Reproduce Convert model using this command line:

mmdet2trt --fp16 1 --enable-mask 0 --save-engine 1 --max-workspace-gb 4 \
mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1.py \
mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1-ec55ce32.pth \
faster_rcnn_r101_lvis

Inference code:

import mmdet2trt.apis
import mmdet.apis
import imageio
from skimage.transform import resize

config_path = 'mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1.py'
chekpoint_path = 'mask_rcnn_r101_fpn_sample1e-3_mstrain_1x_lvis_v1-ec55ce32.pth'

img = imageio.imread('test_image_1.jpg')
img = resize(img, (800, 1333), order=1, anti_aliasing=True, preserve_range=True)

mmdet_model = mmdet.apis.init_detector(config_path, chekpoint_path, 'cuda:0')

mmdet_res = mmdet.apis.inference_detector(mmdet_model, img)

trt_model = mmdet2trt.apis.init_detector('faster_rcnn_r101_lvis')

trt_res = mmdet2trt.apis.inference_detector(trt_model, img, config_path, 'cuda:0')

enviroment:

Additional context Add any other context about the problem here.

grimoire commented 3 years ago

Hi, Thanks for the bug report. There are some problems on cuda(or nvidia driver) 11.1. I will fix it after my business trip. Please try on another docker with cuda11 or 10.2.

buaalingming commented 3 years ago

hi, Grimoire: I had also meet a same issue. Model can be converted, but Inference Error using mmdetection-to-tensorrt/demo/inference.py to process https://github.com/open-mmlab/mmdetection/tree/master/configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_1x_coco.py. My enviroment: OS: [e.g. Ubuntu 16.04] python_version: 3.7 pytorch_version: 1.8 cuda_version: 11.0 cudnn_version: [e.g. 8.0.5.39] mmdetection_version: 2.7.0 Driver_version: 450.57 (Is it must < 450.36??)

Thank you.

grimoire commented 3 years ago

@buaalingming Inference can be performed inside docker (even with host driver 455). You can use a docker with cuda10.2 for now. I am trying to fix this. Please allow me some time.

buaalingming commented 3 years ago

@grimoire hi, Grimoire: I have tried your docker, Yes, it works very well. And i was debuging my enviroment for a long time, and i have found something:

  1. The key cause is pytorch and torchvision Version. torch==1.7.0+torchvision==0.8.1 will lead to that bug, and torch==1.6.0+torchvision==0.7.0 is fine.

  2. The detail error info is :" [TensorRT] ERROR: ../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 8 (CUDNN_STATUS_EXECUTION_FAILED) [TensorRT] ERROR: FAILED_EXECUTION: std::exception "

    I hope my discovery can help you and I am Looking forward to your good news. Thank you.

@davidas1 : hi, Davidas1: You can try torch==1.6.0+torchvision==0.7.0. It may help you.

grimoire commented 3 years ago

@buaalingming Thanks! That would be helpful!