the same Asr model have two different results for one wav file

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

https://www.funasr.com

Other

7.02k stars 747 forks source link

the same Asr model have two different results for one wav file #887

Closed Wanqingling closed 1 year ago

Wanqingling commented 1 year ago

OS: x86_64

Python Version：3.7

Package Version：docker images： registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-1.6.1

Model：speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch

Command：python docker_test_asr.py

Details：result one ： upload file and test in https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary
issue1

result two：upload file and test by docker run image issue2

Error log：just different results with the same model and the same wav file by different test api

file ： test_nc.wav https://github.com/Wanqingling/ASR

lyblsgo commented 1 year ago

$soxi test_nc.wav soxi WARN wav: wave header missing extended part of fmt chunk

Input File : 'test_nc.wav' Channels : 1 Sample Rate : 48000 Precision : 24-bit Duration : 00:00:24.73 = 1186903 samples ~ 1854.54 CDDA sectors File Size : 4.75M Bit Rate : 1.54M Sample Encoding: 32-bit Floating Point PCM

we only support 16bit, use this command to convert wav to 16bit： sox test_nc.wav -b 16 -r 16000 output_16k.wav

Wanqingling commented 1 year ago

$soxi test_nc.wav soxi WARN wav: wave header missing extended part of fmt chunk

Input File : 'test_nc.wav' Channels : 1 Sample Rate : 48000 Precision : 24-bit Duration : 00:00:24.73 = 1186903 samples ~ 1854.54 CDDA sectors File Size : 4.75M Bit Rate : 1.54M Sample Encoding: 32-bit Floating Point PCM

we only support 16bit, use this command to convert wav to 16bit： sox test_nc.wav -b 16 -r 16000 output_16k.wav

OK，thank you very much