ONNXRuntimeError for "Where" node when the input is too long

piraka9011 commented 2 years ago

Describe the bug

I've exported a CitriNet model from NVIDIA/NeMo which is written in PyTorch to Onnx. I am able to successfully perform inference on audio files (converted to spectrograms, which is the input to the model) using onnxruntime on CPU. However, if the length of the audio file is greater than exactly two minutes, I get the following error:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : 
Non-zero status code returned while running Where node. 
Name:'Where_35' 
Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:503 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. 
Attempting to broadcast an axis by a dimension other than 1. 12288 by 12376

I determined two minutes based on the tensor length of sample_rate x seconds, so if the tensor length is greater than (16000*120 = 1920000), I get the above error, otherwise, inference works fine.

Urgency

Ideally we have a resolution or workaround within the week.

System information I'm using Nvidia's NeMo container v22.04: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

OS Platform and Distribution: Ubuntu 20.04
ONNX Runtime installed from: binary
ONNX Runtime version: 1.11.1
Python version: 3.8.10
CUDA/cuDNN version: N/A
GPU model and memory: N/A

To Reproduce

I can provide a sample npz file to run with an onnx model. I cannot publish this publicly though.

Expected behavior

Inference runs as expected.

wschin commented 2 years ago

Are you able to run shape inference and print Where inputs and output shapes? Instructions for shape inference are here. My initial guess is that an upstream operator doesn't generate the expected shape.

piraka9011 commented 2 years ago

I'm not sure I understand what exactly I need to do but this is what I did:

import onnx
from onnx import shape_inference
model_path = "/path/to/model.onnx"
onnx_model = onnx.load(model_path)
inferred_model = shape_inference.infer_shapes(onnx_model)
print(inferred_model.graph.value_info)

The output is attached. I could not find a Where_35 specifically, but there were other Where operators with nothing suspicious about the dims that I could see.

I only found one operation with a dimension of 12288 and that's a Slice_933.

graph_val_info.txt

wschin commented 2 years ago

Thanks. That's what I am looing for. Per Where's spec, its two inputs X and Y should have compatible shapes. For non-equal dimension, one of them must be 1. In your case, we see a dimension pair (12288, 12376), which is illegal. Legal cases can be (1, 12376), (12288, 1), (12288, 12288), and (12376, 12376). To narrow the error, we need more information because both of exporter and onnxruntime can be incorrect. First, I'd like to find the first incorrect operation in the graph. Can you run shape inference again with shorter audio sequence and check if Slice_933 produces expected shape?

piraka9011 commented 2 years ago

Can you run shape inference again with shorter audio sequence

I'm unsure what you mean there? Do you mean run a single pass of a short audio file through the Onnx model then perform shape inference?

import numpy as np

sample_filepath = "/path/to/sample.npz"
sample = np.load(sample_filepath)
outputs = self.ort_session.run(None, {"audio_signal": sample["audio_signal"], "length": sample["length"]})
...
# Run shape inference...

piraka9011 commented 2 years ago

@wschin bump here, just need a little bit of guidance/clarity on your request to help you out :)

manuel3265 commented 3 months ago

Hello @piraka9011 were you able to fix the error? I ran into the same mistake.

microsoft / onnxruntime

ONNXRuntimeError for "Where" node when the input is too long #12065