labring / FastGPT

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
https://tryfastgpt.ai
Other
17k stars 4.54k forks source link

请教c121914yu :如何连接阿里语音模型替代本地whisper-1 #2723

Open sdytzjp opened 4 days ago

sdytzjp commented 4 days ago

image

测试好久没成功,麻烦指点下

@c121914yu

c121914yu commented 3 days ago

😂还能咋指点,就写一个接口的事,熟悉一下 fastapi ,复制一个 form 格式接口改改不就好了。

sdytzjp commented 2 days ago

fastgpt是否可以支持实时语音识别

LuckLittleBoy commented 1 day ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

sdytzjp commented 1 day ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

LuckLittleBoy commented 1 day ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式,非流式就可以吧,你要做实时语音转写吗?

eric0095 commented 1 day ago

如果用硅基流动呢,模型名称不能改whisper-1,何解 @c121914yu

LuckLittleBoy commented 1 day ago

如果用硅基流动呢,模型名称不能改whisper-1,何解 @c121914yu

没用过硅基流动,模型名称可以改,不过用硅基流动平台的话不是应该已经提供统一接口了么,我是自己下载的模型写的webservice接口

sdytzjp commented 16 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

你这个太强大了,我用过本地的whisper(GPU模式)、线上阿里的paraformer-realtime-v2,竟然速度跑不过你这个CPU模式。

非常感谢

sdytzjp commented 15 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式,非流式就可以吧,你要做实时语音转写吗?

请问这个支持哪些音频格式

LuckLittleBoy commented 15 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式,非流式就可以吧,你要做实时语音转写吗?

请问这个支持哪些音频格式

常见格式都支持,mp3,wav,m4a

sdytzjp commented 13 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式,非流式就可以吧,你要做实时语音转写吗?

请问这个支持哪些音频格式

常见格式都支持,mp3,wav,m4a

为什么有的音频格式有问题,(别的m4a可以,这个m4a就报错,格式有限制吗,使用线上阿里的paraformer-realtime-v2没有问题)

(base) [root@ai ~]# curl --request POST 'http://172.16.1.219:8000/v1/audio/transcriptions' --header 'Content-Type: multipart/form-data' --form 'file=@/root/33.m4a' {"detail":"choose a window size 400 that is [2, 0]"}

INFO: 172.22.1.90:60350 - "POST /v1/audio/transcriptions HTTP/1.1" 200 OK WARNING:root:oneapi audio transcriptions, file content type is application/octet-stream 0%| | 0/1 [00:00<?, ?it/s]ERROR:root:choose a window size 400 that is [2, 0] Traceback (most recent call last): File "/app/main.py", line 69, in transcriptions res = model.generate( File "/usr/local/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 260, in generate return self.inference(input, input_len=input_len, cfg) File "/usr/local/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 302, in inference res = model.inference(batch, kwargs) File "/usr/local/lib/python3.9/site-packages/funasr/models/sense_voice/model.py", line 832, in inference speech, speech_lengths = extract_fbank( File "/usr/local/lib/python3.9/site-packages/funasr/utils/load_utils.py", line 173, in extract_fbank data, data_len = frontend(data, data_len, kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, **kwargs) File "/usr/local/lib/python3.9/site-packages/funasr/frontends/wav_frontend.py", line 134, in forward mat = kaldi.fbank( File "/usr/local/lib/python3.9/site-packages/torchaudio/compliance/kaldi.py", line 591, in fbank waveform, window_shift, window_size, padded_window_size = _get_waveform_and_window_properties( File "/usr/local/lib/python3.9/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( AssertionError: choose a window size 400 that is [2, 0] INFO: 172.16.1.219:36424 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error 0%| | 0/1 [00:00<?, ?it/s]

LuckLittleBoy commented 12 hours ago

choose a window size 400 that is [2, 0]

两个音频时长不一样?可能是时长太长了,默认是没有做音频分割的

sdytzjp commented 12 hours ago

对这个音频比较长,除了电话声音识别外,希望利用这个模型实现会议音频总结

LuckLittleBoy commented 11 hours ago

对这个音频比较长,除了电话声音识别外,希望利用这个模型实现会议音频总结

我没试过长音频,你可以试试指定vad模型后能不能实现