microsoft / Olive

Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.
https://microsoft.github.io/Olive/
MIT License
1.35k stars 141 forks source link

Using Whisper for Chinese ASR in iOS may occasionally output illegal UTF-8 strings. #1197

Open hasayakey opened 3 weeks ago

hasayakey commented 3 weeks ago

Describe the bug A clear and concise description of what the bug is.

I followed the document at https://github.com/microsoft/Olive/tree/main/examples/whisper using the following command to generate the Whisper model: python prepare_whisper_configs.py --model_name openai/whisper-tiny --no_audio_decoder --multilingual --enable_timestamps | olive run --config whisper_cpu_int8.json 2> /dev/null. Because using the CPUExecutionProvider on an iPhone causes the phone to overheat severely, I implemented the following strategy: I run an ORTSession every 2 seconds to get the transcribed text, and based on the timestamps in the returned text, I decide whether to discard the corresponding audio samples that have already been correctly transcribed. Most of the time, the text is output normally, but there are instances where the output of an illegal UTF8 string causes the onnxruntime-objc to crash.

crash stack https://github.com/microsoft/onnxruntime/issues/21026

To Reproduce Steps to reproduce the behavior.

Expected behavior A clear and concise description of what you expected to happen.

Olive config Add Olive configurations here.

Olive logs Add logs here.

Other information

Additional context Add any other context about the problem here.

jambayk commented 1 week ago

Hi,

Thanks for creating the issue. Looks like you already opened a related issue in the onnxruntime repository which is a good place to ask since the model is generated using onnxruntime contrib operators. If the issue cannot be resolved from onnxruntime, the devs at https://github.com/microsoft/onnxruntime-extensions might have more insights since they created the post-processing parts of the model.

RageAgainstTheAssembly commented 1 week ago

Hello, I have encountered a similar issue while trying to use Olive Whisper to transcribe in Tajik Language. The resulting model from Olive performs far worse than a basic ONNX model and suffers from severe hallucinations. The Olive model also occasionally produces illegal UTF-8 strings, as you have mentioned. I have been unable to find an explanation or a fix for this.