[Performance] Python infer and C++ are different for audio process

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.65k stars 2.93k forks source link

[Performance] Python infer and C++ are different for audio process #20227

Open lzlwakeup opened 7 months ago

lzlwakeup commented 7 months ago

Describe the issue

Hi, I want to run an onnx model in C++ environment， code in attach, performance like this Cpp_rlt python_rlt C++ result has more background noise, python result is OK. I use same version onnxruntime(1.14) for inference, but the performance different. I want to know, exactly the same version, why is there a difference between C++ and python? Is there a problem with my code setup? c_infer.zip

To reproduce

See attach.

Urgency

No response

Platform

Windows

OS Version

windows 10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime 1.14

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

xadupre commented 7 months ago

In C++, you run fftwf_execute at every iteration. In python, it is done only at the end. I suspect session_options.SetIntraOpNumThreads(1); is the cause.

lzlwakeup commented 7 months ago

Speech signal processing requires overlap to process data, and the current processing requires the combination of a new frame and the previous frame. librosa.stft in python implements this process in simulation at one time for a wav simulation file. real-time processing indeed requires fft processing in every cycle/iteration. As far as the input data is concerned, I have looked at the comparison of the first frame and fftwf is the same as librosa.stft. If the C++ onnxruntime and python were logically identical, and the input data were the same, the different results would not be explained.

lzlwakeup commented 7 months ago

Due to my mistake, the previous version of ort is consistent, which is not accurate. It should be added that the python version uses the win11 + torchcuda, and the C++ environment is the win10 cpu. Of course a small difference is allowed in my opinion, but the difference in results is too big, so I don't know what the reason is. The fft data is consistent and has been confirmed by frame-by-frame printing.

lzlwakeup commented 6 months ago

Are there any new solutions or debugging methods?

xadupre commented 6 months ago

You may try profiling https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html to see if some operator is behaving differently. When you write this, the win11 + torchcuda, and the C++ environment is the win10 cpu, it is not clear to me that both are running on CPU.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.