如果我的业务场景是这样的。。。

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

https://arxiv.org/abs/2408.16725

MIT License

3.06k stars 273 forks source link

如果我的业务场景是这样的。。。 #53

Closed k2o333 closed 1 month ago

k2o333 commented 2 months ago

1，没有明显的输入或者enter的交互 2，回复的密度和什么时候回复根据提示词可以调整

比如，直播时，主播说了某个商品的价格后，它会来一句，太划算了！

mini-omni commented 1 month ago

需要根据你的业务要求进行再开发哈

k2o333 commented 1 month ago

知道有什么已有的方案是一只在听不需要点发送键的吗？

wntg commented 1 month ago

知道有什么已有的方案是一只在听不需要点发送键的吗？

vad？

mini-omni commented 1 month ago

知道有什么已有的方案是一只在听不需要点发送键的吗？

现在的streamlit demo中就是通过VAD自动判断说话结束，如果结束就自动给服务端发送请求。

mini-omni commented 1 month ago

I'll close it for now, please feel free to re-open.