vocodedev / vocode-core

🤖 Build voice-based LLM agents. Modular + open source.
https://vocode.dev
MIT License
2.86k stars 485 forks source link

Integration of Dual Stream TTS based on NoPause #368

Closed Snowdar closed 6 months ago

Snowdar commented 1 year ago

NoPause synthesiser: https://nopause.io/ Python SDK: https://github.com/NoPause-io/nopause-python We have already submitted an MR: https://github.com/vocodedev/vocode-python/pull/361.

In previous Text-To-Speech (TTS) modes, such as when feeding inputs to the synthesiser from ChatGPT, it typically necessitates waiting for the full sentence to ensure accurate synthesis. However, in the dual-stream mode, we permit immediate input of ChatGPT's token output into the TTS system, (supported on both character and sentence-level inputs), and simultaneously return streaming synthesis outputs for real-time playback. This approach not only maintains quality but also reduces the delay resulting from waiting for the complete sentences.

On vocode, we have not created a new concept called DualStreamConversation; instead, we have implemented dual-stream based on the existing StreamConversation architecture and have provided a usage example in the quickstart. Currently, we haven't added any non-dual-stream codes, but it's straightforward to implement, and we plan to continually update it, including turn-based codes.

Considering the differences between dual-stream and previous modes, it may cause compatibility issues within the framework, so we hope that the author of vocode can review our contribution and provide some feedback. Thanks.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 6 months ago

This issue has been automatically closed due to inactivity. Thank you for your contributions.