modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
4.46k stars 493 forks source link

Whisper-large-v3 模型,没有时间戳 #1814

Closed mp075496706 closed 1 week ago

mp075496706 commented 2 weeks ago

What is your question?

我使用Whisper-large-v3模型识别,能得到结果,但是没有文字的时间戳。

Code

from funasr import AutoModel

model = AutoModel(model="iic/Whisper-large-v3",
                  vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
                  )

DecodingOptions = {
    "task": "transcribe",
    "language": None,
    "beam_size": None,
    "fp16": True,
    "without_timestamps": False,
    "prompt": None,
}
res = model.generate(
    DecodingOptions=DecodingOptions,
    batch_size_s=0,
    input=r"https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
)

print(res)

以上是我参照whisper文件夹下的demo.py写的代码。

What's your environment?

mp075496706 commented 2 weeks ago

识别出来的结果是这样的:[{'key': 'asr_example_zh', 'text': '欢迎大家来体验达摩院推出的语音识别模型。'}] 但是我看模型列表的描述中,Whisper-large-v3是有带时间戳输出 这一描述的。 所以,是我哪里没有设置对吗?

LauraGPT commented 1 week ago

Wav file is too short.