Open piraka9011 opened 2 years ago
Are you able to run shape inference and print Where
inputs and output shapes? Instructions for shape inference are here. My initial guess is that an upstream operator doesn't generate the expected shape.
I'm not sure I understand what exactly I need to do but this is what I did:
import onnx
from onnx import shape_inference
model_path = "/path/to/model.onnx"
onnx_model = onnx.load(model_path)
inferred_model = shape_inference.infer_shapes(onnx_model)
print(inferred_model.graph.value_info)
The output is attached. I could not find a Where_35
specifically, but there were other Where
operators with nothing suspicious about the dims that I could see.
I only found one operation with a dimension of 12288 and that's a Slice_933
.
Thanks. That's what I am looing for.
Per Where's spec, its two inputs X
and Y
should have compatible shapes. For non-equal dimension, one of them must be 1. In your case, we see a dimension pair (12288, 12376), which is illegal. Legal cases can be (1, 12376), (12288, 1), (12288, 12288), and (12376, 12376). To narrow the error, we need more information because both of exporter and onnxruntime can be incorrect. First, I'd like to find the first
incorrect operation in the graph. Can you run shape inference again with shorter audio sequence and check if Slice_933
produces expected shape?
Can you run shape inference again with shorter audio sequence
I'm unsure what you mean there? Do you mean run a single pass of a short audio file through the Onnx model then perform shape inference?
import numpy as np
sample_filepath = "/path/to/sample.npz"
sample = np.load(sample_filepath)
outputs = self.ort_session.run(None, {"audio_signal": sample["audio_signal"], "length": sample["length"]})
...
# Run shape inference...
@wschin bump here, just need a little bit of guidance/clarity on your request to help you out :)
Hello @piraka9011 were you able to fix the error? I ran into the same mistake.
Describe the bug
I've exported a CitriNet model from NVIDIA/NeMo which is written in PyTorch to Onnx. I am able to successfully perform inference on audio files (converted to spectrograms, which is the input to the model) using
onnxruntime
on CPU. However, if the length of the audio file is greater than exactly two minutes, I get the following error:I determined two minutes based on the tensor length of
sample_rate x seconds
, so if the tensor length is greater than (16000*120 = 1920000), I get the above error, otherwise, inference works fine.Urgency
Ideally we have a resolution or workaround within the week.
System information I'm using Nvidia's NeMo container v22.04: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
To Reproduce
I can provide a sample
npz
file to run with an onnx model. I cannot publish this publicly though.Expected behavior
Inference runs as expected.