ONNX mmdet converted model runtime CUDNN error

connor-john commented 2 years ago

Created an ONNX model for mmdet faster_rcnn_r50_fpn_1x_coco model both as onnx_static and onnx_dynamic,

Creating the model works, and testing on CPU works,

When testing the model on GPU, with onnxruntime-gpu==1.8.1, both produce CUDNN error:

[E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=connor ; expr=cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize); 
2022-06-02 15:00:34.964696798 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
2022-06-02 15:00:34.964740210 [E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=connor ; expr=cudaEventRecord(current_deferred_release_event, static_cast<cudaStream_t>(GetComputeStream())); 
Traceback (most recent call last):
  File "main.py", line 80, in <module>
    main()
  File "main.py", line 63, in main
    bbox_xyxy, cls_conf, cls_ids = inference_model(model, img)
  File "main.py", line 10, in inference_model
    bbox_result = model([img])
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 107, in __call__
    outputs = self._forward({'input': input_img})
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 43, in _forward
    self.ort_session.run_with_iobinding(self.io_binding)
  File "/home/connor/anaconda3/envs/onnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_0_Relu_1' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
Aborted (core dumped)

The inferencing is the same as in mmdeploy

Env.

pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1
onnxruntime-gpu  1.8.1

Is this an Issue with how the mmdetection model was created or onnxruntime-gpu specific issue? Any help is appreciated, thank you

tpoisonooo commented 2 years ago

After checking https://github.com/Microsoft/onnxruntime/releases/tag/v1.8.1 , I noticed that ort-gpu 1.8.1 only support cu10.1~cu11.1 while torch1.11 depends on cu11.3 .

May it helps.

connor-john commented 2 years ago

Thanks for spotting that @tpoisonooo ,

After creating a new environment as:

pytorch 1.7.1
cudatoolkit 11.0.3
cudnn 8.0.4
onnxruntime-gpu 1.8.1

I still get the same error

Full traceback

2022-06-03 09:51:29.663267904 [E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=connor ; expr=cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize); 
2022-06-03 09:51:29.663309062 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_6_Relu_7' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
2022-06-03 09:51:29.663563714 [E:onnxruntime:Default, cuda_call.cc:117 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=connor ; expr=cudaEventRecord(current_deferred_release_event, static_cast<cudaStream_t>(GetComputeStream())); 
Traceback (most recent call last):
  File "main.py", line 80, in <module>
    main()
  File "main.py", line 63, in main
    bbox_xyxy, cls_conf, cls_ids = inference_model(model, img)
  File "main.py", line 10, in inference_model
    bbox_result = model([img])
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 107, in __call__
    outputs = self._forward({'input': input_img})
  File "/home/connor/Documents/github/mmdeploy-test-onnx/model.py", line 43, in _forward
    self.ort_session.run_with_iobinding(self.io_binding)
  File "/home/connor/anaconda3/envs/onnx/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 229, in run_with_iobinding
    self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_6_Relu_7' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( CudnnHandle(), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), AlgoSearchWorkspaceSize)
Aborted (core dumped)

connor-john commented 2 years ago

I got mmdeploy test.py to run on GPU

python tools/test.py     
     configs/mmdet/detection/detection_onnxruntime_dynamic.py
     work_dir/faster_rcnn_r50_fpn_1x_coco.py     
     --model work_dir/end2end.onnx     
     --device cuda:0

env:

pytorch 1.7.1
cudatoolkit 11.0.3
cudnn 8.0.4
onnxruntime-gpu 1.8.1
mmdeploy 0.4.0
mmdet 2.20.0
mmcv-full 1.4.0

I had to add some of the symbolic helper functions from torch.onnx.symbolic_helper to mmdeploy/pytorch/ops/instance_norm.py since my torch version < 1.8.0, just so mmdeploy would run


def _is_tensor(x):
    return x.type().isSubtypeOf(torch._C.TensorType.get())

def _get_tensor_rank(x):
    if not _is_tensor(x) or x.type() is None:
        return None
    return x.type().dim()

def _get_tensor_sizes(x, allow_nonstatic=True):
    if not _is_tensor(x) or x.type() is None:
        return None
    if allow_nonstatic:
        # Each individual symbol is returned as None.
        # e.g. [1, 'a', 'b'] -> [1, None, None]
        return x.type().varyingSizes()
    # returns None, if exists any symbol in sizes.
    # e.g. [1, 'a', 'b'] -> None
    return x.type().sizes()

def _get_tensor_dim_size(x, dim):
    try:
        sizes = _get_tensor_sizes(x)
        return sizes[dim]
    except Exception:
        pass
    return None

Any idea why I always get the previous comment error whenever I inference with the onnx model on GPU outside of mmdeploy test.py?

Any insights into what solves the CuDNN runtime error is greatly appreciated, thanks

tpoisonooo commented 2 years ago

What is your host CUDA & driver version ? Pytorch installation comes with an CUDA toolkit, Is that matched with your host ?
I thought you would upgrade the ort-gpu to 1.11.x ... As I know mmdeploy not fully tested torch1.7.
The CUDNN error meaning:
- You know that deep learning model consist of convolution and conv has multiple implementation (direct/im2col/winograd/implicit... and so on) and endless blocking/format strategy (kernel16x8x16/nc4hw4...)
- CUDNN is trying to figure out a specified implementation with cudnnFindConvolutionForwardAlgorithmEx but failed
The CUDA error 700:
- It means memory illegal and usually hard to troubleshoot. Memory issues are often difficult for C++ project

Overall, it looks like version mismatch.

Give me the onnx model if it's convenient, I give you a configuration of an environment that works properly.

tpoisonooo commented 2 years ago

My recommendation:

Still using old env

pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1

Open $WORK_DIR, there is an end2end.onnx.

Install a new ort-gpu version, for example 1.11.x

Unit test ort-gpu inference with end2end.onnx

connor-john commented 2 years ago

Is running nvcc --version and nvidia-smi sufficient for getting host CUDA information? if thats the case:

nvcc --version
Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi
NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2

Will having out of sync CUDA in conda env vs. host, be the likely issue? It still works with tools/test.py though.

If thats the case I will re-setup my system with your above recommended env of

pytorch 1.11.0
cudatoolkit  11.3
cudnn 8.2.1

I will provide the ONNX model after that if I am still having issues,

Thanks again @tpoisonooo

connor-john commented 2 years ago

Quick question can I have higher MMCV version than 1.4.0 as recommended in the docs?

I had issues installing MMCV==1.4.0 with higher pytorch version 1.11.0 or should I just compile MMCV from source instead?

tpoisonooo commented 2 years ago

Check mmcv & torch version here.

tpoisonooo commented 2 years ago

cuda11.3 needs driver >= 465

connor-john commented 2 years ago

Thanks @tpoisonooo for your help,

Using your recommended env did help,

I was able to make it work in my old environment after getting similar fails in new env,

Found that the issue was input tensor wasn't being moved to GPU device in my test inference code, this shouldnt affect anyone else, just noting down incase someone else has similar issue

open-mmlab / mmdeploy

ONNX mmdet converted model runtime CUDNN error #549