microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

Non-zero status code returned while running ConvTranspose node. #21034

Open Jerry-Master opened 5 months ago

Jerry-Master commented 5 months ago

Describe the issue

When running my onnx model on C++ in CPU everything works perfectly. However, when running it with the Cuda provider it throws this error:

2024-06-13 16:21:29.9651844 [E:onnxruntime:InferenceEngineORT, cuda_call.cc:118 onnxruntime::CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=008C00014 ; file=C:\a\_work\1\s\onnxruntime\core\providers\cuda\nn\conv_transpose.cc ; line=318 ; expr=cudnnAddTensor(GetCudnnHandle(context), &alpha, s_.b_tensor, b_data, &alpha, s_.y_tensor, y_data);
2024-06-13 16:21:29.9907653 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running ConvTranspose node. Name:'_inlfunc_torch_nn_modules_conv_ConvTranspose2d_p_m_up3_0_1_ConvTranspose_7' Status Message: CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=008C00014 ; file=C:\a\_work\1\s\onnxruntime\core\providers\cuda\nn\conv_transpose.cc ; line=318 ; expr=cudnnAddTensor(GetCudnnHandle(context), &alpha, s_.b_tensor, b_data, &alpha, s_.y_tensor, y_data);

To reproduce

The model has as input lx tensor of shape 1,3,128,256 and lk tensor of shape 1,1,32,32 and the output is called p_8 with shape 1,3,128,256. The model is uploaded to drive here. It was exported from usrnet repo using dynamo exporter and opset 18.

The following python code helps reproduce the issue:

import argparse
import cv2
import onnxruntime
import numpy as np

def _create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument('--onnx-path', type=str, default="usrnet_128x256_32x32.onnx")
    return parser

def main():
    parser = _create_parser()
    args = parser.parse_args()

    x = np.ones((1, 3, 128, 256)).astype(np.float32)
    k = np.ones((1, 1, 32, 32)).astype(np.float32)

    # compute ONNX Runtime output prediction
    ort_session = onnxruntime.InferenceSession(args.onnx_path, providers=['CUDAExecutionProvider'])  
    ort_inputs = {'l_x_': x, 'l_k_': k}
    ort_outs = ort_session.run(['p_8'], ort_inputs)

    out = ort_outs[0].squeeze(0).transpose(1, 2, 0)
    out_norm = np.zeros_like(out)
    out_norm = cv2.normalize(out, -1, 0, 255, norm_type=cv2.NORM_MINMAX)
    cv2.imwrite('out.png', out_norm)

if __name__ == '__main__':
    main()

Urgency

Yes

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.6

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.