modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.45k stars 687 forks source link

sentence_timestamp error!!! #2110

Closed zhangakun closed 2 weeks ago

zhangakun commented 2 weeks ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

when set sentence_timestamp=True, I find that timerange of sentence is identified as timerange of the last word in sentence. see the part of result :

...{'text': '因为人家就是奔着这些优点,', 'start': 897260, 'end': 897400, 'timestamp': [[895640, 895780], [895780, 895940], [895940, 896060], [896060, 896300], [896340, 896480], [896480, 896600], [896600, 896720], [896720, 896840], [896840, 896960], [896960, 897100], [897100, 897260], [897260, 897400]]}, {'text': '仿制的越是高仿的东西,', 'start': 899710, 'end': 899950, 'timestamp': [[897400, 897560], [897560, 897740], [897740, 898005], [898830, 899010], [899010, 899130], [899130, 899290], [899290, 899470], [899470, 899610], [899610, 899710], [899710, 899950]]}, {'text': '其身上的优点就越多。', 'start': 901130, 'end': 901370, 'timestamp': [[899950, 900150], [900150, 900310], [900310, 900430], [900430, 900550], [900550, 900650], [900650, 900830], [900830, 901010], [901010, 901130], [901130, 901370]]}, {'text': '但你发现了再多的优点,', 'start': 902970, 'end': 903210, 'timestamp': [[901750, 901970], [901970, 902130], [902130, 902270], [902270, 902390], [902390, 902490], [902490, 902570], [902570, 902730], [902730, 902830], [902830, 902970], [902970, 903210]]}, {'text': '也不是见定的本质,', 'start': 904170, 'end': 904410, 'timestamp': [[903290, 903450], [903450, 903570], [903570, 903710], [903710, 903870], [903870, 903990], [903990, 904070], [904070, 904170], [904170, 904410]]}, {'text': '见定就是要让我们发现缺点。', 'start': 906610, 'end': 906875, 'timestamp': [[904830, 905050], [905050, 905290], [905390, 905530], [905530, 905650], [905650, 905770], [905770, 905890], [905890, 906010], [906010, 906170], [906170, 906310], [906310, 906450], [906450, 906610], [906610, 906875]]}...

Code sample

`model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", device="cuda:{}".format(index) )

res = model.generate(input=f"{xxx.mp3}", batch_size_s=120, sentence_timestamp=True) `

Expected behavior

Environment

Additional context

zhangakun commented 2 weeks ago

https://github.com/modelscope/FunASR/pull/2024

new version has fixed!!