流式模型latency配置`[5,10,5]`是什么意思？

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

https://www.funasr.com

Other

7.02k stars 747 forks source link

流式模型latency配置`[5,10,5]`是什么意思？ #1026

Closed aixuedegege closed 1 year ago

aixuedegege commented 1 year ago

请教一下这里面的这句话：

chunk_size：表示流式模型latency配置`[5,10,5]`，表示当前音频解码片段为600ms，并且回看300ms，右看300ms。

是什么意思？5,10,5如何计算维600 300 300的？

小白求指教，多谢。

hnluo commented 1 year ago

One frame of speech is 60ms, so 10 * 60ms = 600ms

LauraGPT commented 1 year ago

aixuedegege commented 1 year ago

感谢大家的回答，我有个新的问题，“Paraformer语音识别-中文-通用-16k-实时-large”这里的实时、离线有什么区别么？我的理解是实时是一些小的音频frame组成的训练数据比较多，离线是长音频训练数据多？是这个意思么？

LauraGPT commented 1 year ago

感谢大家的回答，我有个新的问题，“Paraformer语音识别-中文-通用-16k-实时-large”这里的实时、离线有什么区别么？我的理解是实时是一些小的音频frame组成的训练数据比较多，离线是长音频训练数据多？是这个意思么？

Maybe you could test them by:

中文离线文件转写： https://101.37.77.25:1335/static/index.html 中文实时语音听写：

https://101.37.77.25:1336/static/index.html

aixuedegege commented 1 year ago

好的谢谢！

aixuedegege commented 1 year ago

中文离线文件转写： https://101.37.77.25:1335/static/index.html 中文实时语音听写： https://101.37.77.25:1336/static/index.html

功能上离线指的是说完了把整个音频A1丢进模型Model1识别，实时是做的语音流截取部分得到音频A2丢给模型Model2识别。

我想问的是Model2和Model1有什么区别么？是不是都是Paraformer模型，他们在训练时对输入是不是有什么区别么？或者离线模型通过流式服务部署就能变成实时的了吗？