notAI-tech / NudeNet

Lightweight nudity detection
https://nudenet.notai.tech/
GNU Affero General Public License v3.0
1.76k stars 342 forks source link

CUDA error when inference with onnxruntime-gpu #84

Open yanyabo111 opened 3 years ago

yanyabo111 commented 3 years ago

When I tried to inference the model with the onnxruntime-gpu, a CUDA error occured.

def __init__(self, model_name="default"):
    checkpoint_path = '/root/tensor/nudenet/checkpoint/detector_v2_default_checkpoint.onnx'
    classes_path = '/root/tensor/nudenet/checkpoint/detector_v2_default_classes'

    # CPUExecutionProvider CUDAExecutionProvider
    self.detection_model = onnxruntime.InferenceSession(checkpoint_path, providers=["CUDAExecutionProvider"])
    self.classes = [c.strip() for c in open(classes_path).readlines() if c.strip()]

The error is

2021-03-07 06:53:12.871020963 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running GatherND node. Name:'filtered_detections/map/while/GatherNd_28' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument
2021-03-07 06:53:12.871079083 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Loop node. Name:'generic_loop_Loop__492' Status Message: Non-zero status code returned while running GatherND node. Name:'filtered_detections/map/while/GatherNd_28' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument
Traceback (most recent call last):
  File "detector.py", line 115, in <module>
    print(m.detect("/root/tensor/image-quality-assessment/t1.jpg"))
  File "detector.py", line 90, in detect
    outputs = self.detection_model.run(
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Loop node. Name:'generic_loop_Loop__492' Status Message: Non-zero status code returned while running GatherND node. Name:'filtered_detections/map/while/GatherNd_28' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument

Specify versions of the following libraries

  1. nudenet
  2. onnxruntime-gpu: 1.7
  3. CUDA 11.0.3 and cuDNN 8.0.2.4
  4. RTX 3090

When I run the model with onnxruntime CPU, everything is fine. I also convert the onnx model to pb, and it can run on the tfserving_gpu docker.

Is the node config need be change or is the onnxruntime-gpu's problem?

bedapudi6788 commented 3 years ago

@yanyabo111 I am able to reproduce the issue. I will try to figure it out when I get some free time. Meanwhile, you can fallback to previous versions of nudenet and use the tensorflow versions (that works with gpu).

yanyabo111 commented 3 years ago

@bedapudi6788 Really appreciate your hard work, is there anything I can help?

SiavashCS commented 3 years ago

Having the same issue here. Also tried other versions of onnxruntime-gpu (1.4.0 to 1.7.0)

mrjarhead commented 3 years ago

FYI - I hit the same bug with the ONNX model in releases, and was able to resolve it by converting the TensorFlow model (detector_v2_default_checkpoint_tf) to opset 11. I pulled down the TensorFlow model (detector_v2_default_checkpoint_tf), converted it to ONNX using tf2onnx, and no more exception.

The tf2onnx command I used after I downloaded the TF model was:

python -m tf2onnx.convert --saved-model c:\saved_model_dir --opset 11 --output saved_model.onnx

Hope that helps!

SiavashCS commented 3 years ago

FYI - I hit the same bug with the ONNX model in releases, and was able to resolve it by converting the TensorFlow model (detector_v2_default_checkpoint_tf) to opset 11. I pulled down the TensorFlow model (detector_v2_default_checkpoint_tf), converted it to ONNX using tf2onnx, and no more exception.

The tf2onnx command I used after I downloaded the TF model was:

python -m tf2onnx.convert --saved-model c:\saved_model_dir --opset 11 --output saved_model.onnx

Hope that helps!

Worked :) thanks a lot.

Zalways commented 8 months ago

i met similar problem when a inference on cuda: FAIL : Non-zero status code returned while running TopK node. Name:'/model/TopK' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument can you help me with my problem? thank you! @mrjarhead