modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
7.08k stars 755 forks source link

Array index out of bounds causes segment fault when ASR result is empty #2227

Open keeofkoo opened 3 days ago

keeofkoo commented 3 days ago

🐛 Bug

In the ONNXRuntime implementation, during the processing of the timestamp outputs, an array-index-out-of-bounds exception will crash the program with a segmentation fault.

To Reproduce

Use iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch as the offline model (converted to ONNX format beforehand) and perform ASR on a silent fragment.

Additional context

When the ASR result is empty, hence the timestamp, msg_stamp will be a zero-length vector. As the type of msg_stamp.size() is unsigned, msg_stamp.size()-1 in line 422 encounters an overflow, and the index i becomes invalid.

https://github.com/modelscope/FunASR/blob/bb6018e753e95232781851fe25e0a558d206d16d/runtime/onnxruntime/src/funasrruntime.cpp#L419-L427

An easy fix is to change the terminating condition from i<msg_stamp.size()-1 to i<msg_stamp.size().

There are two more appearances of the issue in the same source file, by the way.

Manamama commented 1 day ago

I confirm that this has happened in my pipeline. I had to sanitize the segments passed 'by hand' first then.