microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.83k stars 2.94k forks source link

1 : Fail : Non-zero status code returned while running FusedConv node. #10894

Open L-Reichardt opened 2 years ago

L-Reichardt commented 2 years ago

Describe the bug Inference breaking onnxruntime-gpu/CUDA error when attempting to use ONNX model converted from tensorflow .pb file. Model works fine on CPU. Tensorflow model itself works fine on GPU. Found no related Issue or solution related to this Issue. Models and logs found here Google Drive (file format blocked by github)

Urgency Urgent: by Monday 21.03.2022

System information

To Reproduce

.... def(....): np_images = image.astype(np.float32, copy=False) np_images = np_images[np.newaxis, ...] # Adds batch dimension num_boxes, scores, boxes = self.session.run(None, {self.input_name: np_images})


**Expected behavior**
Failure to run interference script. Error:

[E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 4: CUDNN_STATUS_INTERNALERROR ; GPU=0 ; hostname=MEM-WAS-04 ; expr=cudnnFindConvolutionForwardAlgorithmEx( s.handle, s_.xtensor, s.xdata, s.wdesc, s.wdata, s.convdesc, s.ytensor, s.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); 2022-03-16 17:42:29.548068687 [E:onnxruntime:, sequential_executor.cc:346 Execute] Non-zero status code returned while running FusedConv node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Conv2D_FirstStageFeatureExtractor/resnet_v1_101/resnet_v1101/conv1/Relu' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s.handle, s_.xtensor, s.xdata, s.wdesc, s.wdata, s.convdesc, s.ytensor, s.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size) Traceback (most recent call last): File "anonymize.py", line 186, in main(image_extensions=args.image_extensions, File "anonymize.py", line 179, in main anonymizer.anonymize_images(input_path=input_folder, output_path=output_folder, File "anonymize.py", line 144, in anonymize_images anonymized_image, detections = self.anonymize_image(image=image, detection_thresholds=detection_thresholds) File "anonymize.py", line 121, in anonymize_image new_boxes = detector.detect(image, detection_threshold=detection_thresholds[kind]) File "/home/labor/Documents/Project/face_lisence_plate_anonymization/anonymizer/detection/detector.py", line 78, in detect num_boxes, scores, boxes = self.session.run(None, {self.input_name: np_images}) File "/home/labor/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Conv2D_FirstStageFeatureExtractor/resnet_v1_101/resnet_v1101/conv1/Relu' Status Message: CUDNN error executing cudnnFindConvolutionForwardAlgorithmEx( s.handle, s_.xtensor, s.xdata, s.w_desc,



**Screenshots**
Attached.
![Issue_Conv](https://user-images.githubusercontent.com/72140033/158646710-90c84449-4c35-41c3-8c10-403fef46a8fe.png)

**Additional context**
None
ephr4321 commented 2 years ago

I had a similar problem when running about 12 different models. I got the error only at the 8th model, and saw I'm actually out of GPU memory... I see you have 12GB of Memory, but maybe...

tall-josh commented 2 years ago

I got a similar error:

RuntimeError: Error in execution: Non-zero status code returned while running FusedConv node. Name:'Conv_0' Status Message: /onnxruntime_src/include/onnxruntime/core/framework/op_kernel_context.h:40 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: input

Turns out I was checking for Cuda availability before setting io_binding for my input. I only set the binding if Cuda was available but used sess.run_with_iobinding(self.io_binding) in both Cuda and CPU cases. The fix was to remove the conditional around the bind_cpu_input. ie:

#### BAD ####
if ort.get_device() == "GPU":
    self.io_binding.bind_cpu_input("input", preproc_crops)
self.sess.run_with_iobinding(self.io_binding)
ort_outputs = self.io_binding.copy_outputs_to_cpu()
#############

#### GOOD ####
self.io_binding.bind_cpu_input("input", preproc_crops)
self.sess.run_with_iobinding(self.io_binding)
ort_outputs = self.io_binding.copy_outputs_to_cpu()
#############

In the docs that I only read as a last resort :-p it says io_binding.bind_cpu_input('input', X) will "copy the data over to the CUDA device if 'input' is consumed by nodes on the CUDA device". So if 'input' is not consumed by nodes of the CUDA device it will not do the copy, but 'input' will still be passed to the CPU.

The error in my original thinking was that the bind was only required if GPU was available but that meant the 'input' was not bound to any device in the case where GPU was not available.