microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.59k stars 2.92k forks source link

STFT op has the wrong expected shape #14316

Open milesial opened 1 year ago

milesial commented 1 year ago

Describe the issue

The STFT op has the wrong "expected" shape that doesn't match it's output (correct) shape because of some off-by-one error.

2023-01-17 00:16:53.852917009 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {1,3002,201,2} does not match actual shape of {1,3001,201,2} for output Y onnx output shape (1, 3001, 201, 2)

Originally from https://github.com/pytorch/pytorch/pull/92087#issuecomment-1383235742

This prevents adding new nodes after the STFT node since there will be expected shape mismatch.

To reproduce

Here is a standalone repro that creates a graph with only a STFT op, and we see the runtime throw the issue. Also happens with optimization enabled.

from onnx import TensorProto
from onnx import numpy_helper, TensorProto
from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info, make_operatorsetid
from onnx.checker import check_model
import numpy as np
import onnxruntime as rt

X = make_tensor_value_info('X', TensorProto.FLOAT, [1, 480400])
frame_step = numpy_helper.from_array(np.array(160, dtype=np.int64), name='frame_step')
window = numpy_helper.from_array(np.ones(400, dtype=np.float32), name='window')
frame_length = numpy_helper.from_array(np.array(400, dtype=np.int64), name='frame_length')

Y = make_tensor_value_info('Y', TensorProto.FLOAT, [None, None, None, None])
node = make_node('STFT', ['X', 'frame_step', 'window', 'frame_length'], ['Y'])

graph = make_graph([node], 'test2', [X], [Y], [frame_step, window, frame_length])
onnx_model = make_model(graph, opset_imports=[make_operatorsetid('', 17)])
check_model(onnx_model)

with open("test.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL
session = rt.InferenceSession('test.onnx', sess_options)
print('onnx output shape', session.run(None, {'X': np.random.randn(1, 480400).astype(np.float32)})[0].shape)

2023-01-17 00:16:53.852917009 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {1,3002,201,2} does not match actual shape of {1,3001,201,2} for output Y

onnx output shape (1, 3001, 201, 2)

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

jstoecker commented 1 year ago

The shape inference logic in the ONNX definition looks a bit strange to me:

bool is_onesided = static_cast<bool>(getAttribute(ctx, "onesided", 0));
if (is_onesided) {
  dft_size = is_onesided ? ((dft_size >> 1) + 1) : dft_size;
}

auto n_dfts = static_cast<int64_t>((signal_size - dft_size) / static_cast<float>(frame_step_value)) + 1;

// The output has the following shape: [batch_size][frames][dft_unique_bins][2]
ONNX_NAMESPACE::TensorShapeProto result_shape_proto;
result_shape_proto.add_dim()->set_dim_value(input_shape.dim(0).dim_value()); // batch size
result_shape_proto.add_dim()->set_dim_value(n_dfts);
result_shape_proto.add_dim()->set_dim_value(dft_size);
result_shape_proto.add_dim()->set_dim_value(2);

It looks like the author intended dft_size to represent the window/frame length, but then repurposed it to represent the dft_unique_bins mentioned in the description. That makes the math for computing n_dfts wrong. I would instead expect something like this:

bool is_onesided = static_cast<bool>(getAttribute(ctx, "onesided", 0));
-if (is_onesided) {
-  dft_size = is_onesided ? ((dft_size >> 1) + 1) : dft_size;
-}
+ int64_t dft_unique_bins = is_onesided ? ((dft_size >> 1) + 1) : dft_size;

auto n_dfts = static_cast<int64_t>((signal_size - dft_size) / static_cast<float>(frame_step_value)) + 1;

// The output has the following shape: [batch_size][frames][dft_unique_bins][2]
ONNX_NAMESPACE::TensorShapeProto result_shape_proto;
result_shape_proto.add_dim()->set_dim_value(input_shape.dim(0).dim_value()); // batch size
result_shape_proto.add_dim()->set_dim_value(n_dfts);
-result_shape_proto.add_dim()->set_dim_value(dft_size);
+result_shape_proto.add_dim()->set_dim_value(dft_unique_bins);
result_shape_proto.add_dim()->set_dim_value(2);