1 : Fail : Non-zero status code returned while running FusedConv node.

L-Reichardt commented 2 years ago

Describe the bug Inference breaking onnxruntime-gpu/CUDA error when attempting to use ONNX model converted from tensorflow .pb file. Model works fine on CPU. Tensorflow model itself works fine on GPU. Found no related Issue or solution related to this Issue. Models and logs found here Google Drive (file format blocked by github)

Urgency Urgent: by Monday 21.03.2022

System information

Linux Ubuntu 20.04
ONNX Runtime installed from binary:
ONNX Runtime version: gpu 1.10.0
Python version: 3.8.10
Visual Studio version (if applicable): n.a.
GCC/Compiler version (if compiling from source): n.a.
CUDA/cuDNN version: 10.04, 8.2.4
GPU model and memory: RTX3060 12 GB

To Reproduce

Attached Tensorflow model converted to attached ONNX Model
Models changed with the following command. All opset version from 9-15 tried without success: python -m tf2onnx.convert --input ./f32_input_face.pb --output ./model_face.onnx --inputs image_tensor:0 --outputs num_detections:0,detection_scores:0,detection_boxes:0 --output_frozen_graph ./optim_face.pb --opset 11
My ONNX will not infer using CUDA from a python script. Tensorflow model runs fine on CUDA, but not when converted to ONNX. ONNX model runs fine from CPU. Model conversion without issues, model checks without issues.

This is my script:


class(....):
self.sess_options = ort.SessionOptions()
self.sess_options.enable_profiling = True
self.session = ort.InferenceSession(weights_path, providers=self.providers, sess_options=self.sess_options)
self.input_name = 'image_tensor:0'

.... def(....): np_images = image.astype(np.float32, copy=False) np_images = np_images[np.newaxis, ...] # Adds batch dimension num_boxes, scores, boxes = self.session.run(None, {self.input_name: np_images})


**Expected behavior**
Failure to run interference script. Error:

[E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNALERROR ; GPU=0 ; hostname=MEM-WAS-04 ; expr=cudnnFindConvolutionForwardAlgorithmEx( s.handle, s_.xtensor, s.xdata, s.wdesc, s.wdata, s.convdesc, s.ytensor, s.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); 2022-03-16 17:42:29.548068687 [E:onnxruntime:, sequential_executor.cc:346 Execute] Non-zero status code returned while running FusedConv node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Conv2D_FirstStageFeatureExtractor/resnet_v1_101/resnet_v1101/conv1/Relu' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s.handle, s_.xtensor, s.xdata, s.wdesc, s.wdata, s.convdesc, s.ytensor, s.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size) Traceback (most recent call last): File "anonymize.py", line 186, in main(image_extensions=args.image_extensions, File "anonymize.py", line 179, in main anonymizer.anonymize_images(input_path=input_folder, output_path=output_folder, File "anonymize.py", line 144, in anonymize_images anonymized_image, detections = self.anonymize_image(image=image, detection_thresholds=detection_thresholds) File "anonymize.py", line 121, in anonymize_image new_boxes = detector.detect(image, detection_threshold=detection_thresholds[kind]) File "/home/labor/Documents/Project/face_lisence_plate_anonymization/anonymizer/detection/detector.py", line 78, in detect num_boxes, scores, boxes = self.session.run(None, {self.input_name: np_images}) File "/home/labor/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Conv2D_FirstStageFeatureExtractor/resnet_v1_101/resnet_v1101/conv1/Relu' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s.handle, s_.xtensor, s.xdata, s.w_desc,



**Screenshots**
Attached.
![Issue_Conv](https://user-images.githubusercontent.com/72140033/158646710-90c84449-4c35-41c3-8c10-403fef46a8fe.png)

**Additional context**
None

ephr4321 commented 2 years ago

I had a similar problem when running about 12 different models. I got the error only at the 8th model, and saw I'm actually out of GPU memory... I see you have 12GB of Memory, but maybe...

tall-josh commented 2 years ago

I got a similar error:

RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_0' Status Message: /onnxruntime_src/include/onnxruntime/core/framework/op_kernel_context.h:40 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: input

Turns out I was checking for Cuda availability before setting io_binding for my input. I only set the binding if Cuda was available but used sess.run_with_iobinding(self.io_binding) in both Cuda and CPU cases. The fix was to remove the conditional around the bind_cpu_input. ie:

#### BAD ####
if ort.get_device() == "GPU":
    self.io_binding.bind_cpu_input("input", preproc_crops)
self.sess.run_with_iobinding(self.io_binding)
ort_outputs = self.io_binding.copy_outputs_to_cpu()
#############

#### GOOD ####
self.io_binding.bind_cpu_input("input", preproc_crops)
self.sess.run_with_iobinding(self.io_binding)
ort_outputs = self.io_binding.copy_outputs_to_cpu()
#############

In the docs that I only read as a last resort :-p it says io_binding.bind_cpu_input('input', X) will "copy the data over to the CUDA device if 'input' is consumed by nodes on the CUDA device". So if 'input' is not consumed by nodes of the CUDA device it will not do the copy, but 'input' will still be passed to the CPU.

The error in my original thinking was that the bind was only required if GPU was available but that meant the 'input' was not bound to any device in the case where GPU was not available.

microsoft / onnxruntime

1 : Fail : Non-zero status code returned while running FusedConv node. #10894