lipku / metahuman-stream

Real time interactive streaming digital human
https://livetalking-doc.readthedocs.io/
Apache License 2.0
3.54k stars 499 forks source link

大模型是流式输出的情况下,tts也用gptsovits流式输出,只会播放最后一句,怎么办 #256

Open dizhenx opened 2 weeks ago

dizhenx commented 2 weeks ago

大模型是流式输出的情况下,tts也用gptsovits流式输出,只会播放最后一句 大模型用的websockets做流式输出,输出语句传给human接口后,gptsovits也是流式的,但是,只能播放最后一句,前面的句子貌似没来得及播放就被后面新来的流式输出给覆盖了,结果一大段话,只有最后一个流式的句子会被tts语音合成出来?这个该怎么解决?是gptsovits要设置一下,还是human接口下面,要把大模型给的流式输出一句句缓存做个队列,然后按顺序发送给tts?

lipku commented 2 weeks ago

不用打断模式,修改前端页面 interrupt:false

jmanhype commented 4 days ago

My issue is its only repeating the REF_FILE / REF_TEXT my sovits server is getting the call from mettahumanstream but its only generating the ref file audio over and over for each call

ThornbirdZhang commented 2 days ago

@jmanhype 你的是gpt-sovits?查一下它的日志看看。

jmanhype commented 2 days ago

Apologies I explored more and was able to see that if you leave text ref blank it will then use the tts text