microsoft / Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
https://microsoft.github.io/Olive/
MIT License
1.62k stars 172 forks source link

ONNX Runtime AudioDecoder Error on Olive with Whisper Model #1362

Open mridulrao opened 2 months ago

mridulrao commented 2 months ago

Describe the bug I encountered an error while using Olive with the Whisper ONNX model for transcription. The error occurs during the AudioDecoder step in the ONNX Runtime.

To Reproduce Set up an environment with the Whisper ONNX model using Olive(exactly same given in README.md)

python test_transcription.py --config whisper_cpu_int8.json --audio_path yt_audio.mp3

Expected behavior Transcriptions

2024-09-18 08:49:27.862823112 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Cannot detect audio stream format Traceback (most recent call last): File "/teamspace/studios/this_studio/Olive/examples/whisper/test_transcription.py", line 129, in output_text = main() File "/teamspace/studios/this_studio/Olive/examples/whisper/test_transcription.py", line 124, in main output = olive_model.run_session(session, input_data) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/olive/model/handler/onnx.py", line 146, in run_session return session.run(output_names, inputs, **kwargs) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Cannot detect audio stream format

Other information

jambayk commented 2 months ago

The audio input for whisper has some restrictions such as the sample rate being 16khz https://github.com/openai/whisper/blob/279133e3107392276dc509148da1f41bfb532c7e/whisper/audio.py#L13 It also cannot be longer than 30s.

Can you confirm your audio meets these requirements?

jambayk commented 2 months ago

Can you also share the version of onnxruntime and onnxruntime-extensions you are using?

mridulrao commented 2 months ago

Oh, I didnt see the limit on audio length. The audio lengths I am trying to process varies between 7-12 mins. The sample rate is 16khz.

Versions - onnxruntime==1.19.2 onnxruntime_extensions==0.12.0

Is it recommended to change the hard coded lengths? Or should I clip the audio lengths in multiple batch of 30 secs?

jambayk commented 3 weeks ago

hi, sorry for the delayed response. The 30s limit cannot be changed so you would need to clip the audio and run them individually.