请教c121914yu ：如何连接阿里语音模型替代本地whisper-1

sdytzjp commented 4 days ago

测试好久没成功,麻烦指点下

@c121914yu

c121914yu commented 3 days ago

😂还能咋指点，就写一个接口的事，熟悉一下 fastapi ，复制一个 form 格式接口改改不就好了。

sdytzjp commented 2 days ago

fastgpt是否可以支持实时语音识别

LuckLittleBoy commented 1 day ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

sdytzjp commented 1 day ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

LuckLittleBoy commented 1 day ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式，非流式就可以吧，你要做实时语音转写吗？

eric0095 commented 1 day ago

如果用硅基流动呢，模型名称不能改whisper-1，何解 @c121914yu

LuckLittleBoy commented 1 day ago

如果用硅基流动呢，模型名称不能改whisper-1，何解 @c121914yu

没用过硅基流动，模型名称可以改，不过用硅基流动平台的话不是应该已经提供统一接口了么，我是自己下载的模型写的webservice接口

sdytzjp commented 16 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

你这个太强大了，我用过本地的whisper（GPU模式）、线上阿里的paraformer-realtime-v2，竟然速度跑不过你这个CPU模式。

非常感谢

sdytzjp commented 15 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式，非流式就可以吧，你要做实时语音转写吗？

请问这个支持哪些音频格式

LuckLittleBoy commented 15 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式，非流式就可以吧，你要做实时语音转写吗？

请问这个支持哪些音频格式

常见格式都支持，mp3，wav，m4a

sdytzjp commented 13 hours ago

可以简单参考下https://github.com/LuckLittleBoy/SenseVoice-OneApi

这个是流时还是非流时

非流式，非流式就可以吧，你要做实时语音转写吗？

请问这个支持哪些音频格式

常见格式都支持，mp3，wav，m4a

为什么有的音频格式有问题，（别的m4a可以，这个m4a就报错，格式有限制吗，使用线上阿里的paraformer-realtime-v2没有问题）

(base) [root@ai ~]# curl --request POST 'http://172.16.1.219:8000/v1/audio/transcriptions' --header 'Content-Type: multipart/form-data' --form 'file=@/root/33.m4a' {"detail":"choose a window size 400 that is [2, 0]"}

INFO: 172.22.1.90:60350 - "POST /v1/audio/transcriptions HTTP/1.1" 200 OK WARNING:root:oneapi audio transcriptions, file content type is application/octet-stream 0%| | 0/1 [00:00<?, ?it/s]ERROR:root:choose a window size 400 that is [2, 0] Traceback (most recent call last): File "/app/main.py", line 69, in transcriptions res = model.generate( File "/usr/local/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 260, in generate return self.inference(input, input_len=input_len, cfg) File "/usr/local/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 302, in inference res = model.inference(batch, kwargs) File "/usr/local/lib/python3.9/site-packages/funasr/models/sense_voice/model.py", line 832, in inference speech, speech_lengths = extract_fbank( File "/usr/local/lib/python3.9/site-packages/funasr/utils/load_utils.py", line 173, in extract_fbank data, data_len = frontend(data, data_len, kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, **kwargs) File "/usr/local/lib/python3.9/site-packages/funasr/frontends/wav_frontend.py", line 134, in forward mat = kaldi.fbank( File "/usr/local/lib/python3.9/site-packages/torchaudio/compliance/kaldi.py", line 591, in fbank waveform, window_shift, window_size, padded_window_size = _get_waveform_and_window_properties( File "/usr/local/lib/python3.9/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( AssertionError: choose a window size 400 that is [2, 0] INFO: 172.16.1.219:36424 - "POST /v1/audio/transcriptions HTTP/1.1" 500 Internal Server Error 0%| | 0/1 [00:00<?, ?it/s]

LuckLittleBoy commented 12 hours ago

choose a window size 400 that is [2, 0]

两个音频时长不一样？可能是时长太长了，默认是没有做音频分割的

sdytzjp commented 12 hours ago

对这个音频比较长，除了电话声音识别外，希望利用这个模型实现会议音频总结

LuckLittleBoy commented 11 hours ago

对这个音频比较长，除了电话声音识别外，希望利用这个模型实现会议音频总结

我没试过长音频，你可以试试指定vad模型后能不能实现

labring / FastGPT

请教c121914yu ：如何连接阿里语音模型替代本地whisper-1 #2723