Open mridulrao opened 2 months ago
The audio input for whisper has some restrictions such as the sample rate being 16khz https://github.com/openai/whisper/blob/279133e3107392276dc509148da1f41bfb532c7e/whisper/audio.py#L13 It also cannot be longer than 30s.
Can you confirm your audio meets these requirements?
Can you also share the version of onnxruntime and onnxruntime-extensions you are using?
Oh, I didnt see the limit on audio length. The audio lengths I am trying to process varies between 7-12 mins. The sample rate is 16khz.
Versions - onnxruntime==1.19.2 onnxruntime_extensions==0.12.0
Is it recommended to change the hard coded lengths? Or should I clip the audio lengths in multiple batch of 30 secs?
hi, sorry for the delayed response. The 30s limit cannot be changed so you would need to clip the audio and run them individually.
Describe the bug I encountered an error while using Olive with the Whisper ONNX model for transcription. The error occurs during the AudioDecoder step in the ONNX Runtime.
To Reproduce Set up an environment with the Whisper ONNX model using Olive(exactly same given in README.md)
python test_transcription.py --config whisper_cpu_int8.json --audio_path yt_audio.mp3
Expected behavior Transcriptions
2024-09-18 08:49:27.862823112 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Cannot detect audio stream format Traceback (most recent call last): File "/teamspace/studios/this_studio/Olive/examples/whisper/test_transcription.py", line 129, in
output_text = main()
File "/teamspace/studios/this_studio/Olive/examples/whisper/test_transcription.py", line 124, in main
output = olive_model.run_session(session, input_data)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/olive/model/handler/onnx.py", line 146, in run_session
return session.run(output_names, inputs, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Cannot detect audio stream format
Other information