modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
7.02k stars 747 forks source link

the same Asr model have two different results for one wav file #887

Closed Wanqingling closed 1 year ago

Wanqingling commented 1 year ago

OS: x86_64

Python Version:3.7

Package Version:docker images: registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-1.6.1

Model:speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch

Command:python docker_test_asr.py

Details:result one : upload file and test in https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary
issue1

result two:upload file and test by docker run imageissue2

Error log:just different results with the same model and the same wav file by different test api

file : test_nc.wav https://github.com/Wanqingling/ASR

lyblsgo commented 1 year ago

$soxi test_nc.wav soxi WARN wav: wave header missing extended part of fmt chunk

Input File : 'test_nc.wav' Channels : 1 Sample Rate : 48000 Precision : 24-bit Duration : 00:00:24.73 = 1186903 samples ~ 1854.54 CDDA sectors File Size : 4.75M Bit Rate : 1.54M Sample Encoding: 32-bit Floating Point PCM

we only support 16bit, use this command to convert wav to 16bit: sox test_nc.wav -b 16 -r 16000 output_16k.wav

Wanqingling commented 1 year ago

$soxi test_nc.wav soxi WARN wav: wave header missing extended part of fmt chunk

Input File : 'test_nc.wav' Channels : 1 Sample Rate : 48000 Precision : 24-bit Duration : 00:00:24.73 = 1186903 samples ~ 1854.54 CDDA sectors File Size : 4.75M Bit Rate : 1.54M Sample Encoding: 32-bit Floating Point PCM

we only support 16bit, use this command to convert wav to 16bit: sox test_nc.wav -b 16 -r 16000 output_16k.wav

OK,thank you very much