SSDLite 320: RuntimeException on CUDA. TopK index assert was false.

Describe the issue

Description

I get the following error

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running TopK node. Name:'TopK_6271' Status Message: /onnxruntimesrc/onnxruntime/core/providers/cuda/math/topk.cc:64 onnxruntime::common::Status onnxruntime::cuda::TopK::ComputeInternal(onnxruntime::OpKernelContext*) const [with bool inputk = true] K >= 0 && K_ <= tensor_X->Shape().GetDims()[axis] was false.

while running self.ort_session.run( None, {"input1" : images} )

on CUDAExecutionProvider

On the attached onnx model based on the SSDMobilenet

torchvision.models.detection.ssdlite320_mobilenet_v3_large

The files

ssdlite.zip

Other information

By running
python3.9 -m onnxruntime.tools.check_onnx_model_mobile_usability --log_level debug modules/hand_gestures_recognition/ssdlite.onnx

I get the following:

INFO: Checking modules/hand_gestures_recognition/ssdlite.onnx for usability with ORT Mobile. INFO: Checking NNAPI INFO: 32 partitions with a total of 755/2514 nodes can be handled by the NNAPI EP. INFO: 136 nodes are in subgraphs, which are currently not handled. INFO: Partition sizes: [32, 26, 24, 10, 11, 4, 2, 3, 2, 4, 2, 4, 2, 4, 2, 5, 3, 2, 5, 4, 14, 5, 3, 2, 5, 4, 2, 5, 4, 88, 264, 208] INFO: Unsupported nodes due to operator=1135 INFO: Unsupported nodes due to input having a dynamic shape=488 INFO: Unsupported ops: ai.onnx:ConstantOfShape,ai.onnx:Equal,ai.onnx:GatherND,ai.onnx:Greater,ai.onnx:HardSigmoid,ai.onnx:If,ai.onnx:NonZero,ai.onnx:ReduceProd,ai.onnx:Shape,ai.onnx:Split,ai.onnx:TopK DEBUG: Caveats that have not been checked and may result in a node not being supported:
ai.onnx:Conv:Only 2D Conv is supported. Weights and bias should be constant. ai.onnx:Gather:Input indices should be constant if not int32 type. ai.onnx:GlobalAveragePool:Only 2D Pool is supported. ai.onnx:Pad:Only constant mode Pad is supported. Input pads and constant_value should be constant. Input pads values should be non-negative. ai.onnx:Resize:Only 2D Resize is supported. ai.onnx:Squeeze:Input axes should be constant. ai.onnx:Unsqueeze:Input axes should be constant. INFO: NNAPI is not recommended with this model as there are 32 partitions covering 30.0% of the nodes in the model. This will most likely result in worse performance than just using the CPU EP.

To compile the onnx from scratch (Optional)

torchvision_model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=False)
self.ONNX_FILE_PATH = 'ssdlite.onnx'
torch.onnx.export(torchvision_model , dummy_input, self.ONNX_FILE_PATH, input_names=['input1'],
                              output_names=['output1boxes', 'output1scores', 'output1lables', 'output2boxes', 'output2scores', 'output2lables'], 
verbose=True, do_constant_folding=False, export_params=True, opset_version=13)

onnx_model = onnx.load(self.ONNX_FILE_PATH)
onnx.checker.check_model(onnx_model)

To reproduce

assert 'CUDAExecutionProvider' in onnxruntime.get_available_providers() sess_options = onnxruntime.SessionOptions() session = onnxruntime.InferenceSession(export_model_path, providers=['CUDAExecutionProvider'])

random_dummy_input = torch.randn(8, 3, 320, 320).numpy() # this dose not get any errors

real_input= images # TODO: the numpy array where it fails. Load this form the file attached.

results = session.run( None, {"input1" : real_input} )

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.1 LTS Release: 22.04 Codename: jammy

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA Version: 11.7

Update [11:46 Wednesday, December 7, 2022 (GMT+1)]:

Running on CPUExecutionProvider I obtain the a similar error. Maybe this is more clear.

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TopK node. Name:'TopK_6271' Status Message: k argument [4] should not be greater than specified axis dim value [3]

Apparently on that input the TopK recive a K=4 on but got a tensor containing shape 3

microsoft / onnxruntime