gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
https://arxiv.org/abs/2410.11190
MIT License
1.46k stars 176 forks source link

Feature Request: Add Real-time Voice Chat Demo #15

Open freddyaboulton opened 1 week ago

freddyaboulton commented 1 week ago

Overview

It would be great to have a demo that enables natural voice conversations with omni-mini-2 using WebRTC technology. This would allow users to interact with the model through speech in real-time, both locally and in cloud environments.

Existing Implementation Reference

I already have a working implementation for omni-mini-1 on Hugging Face Spaces that demonstrates this capability:

https://github.com/user-attachments/assets/040ec236-b336-4a14-880a-f9cde82e63e1

Key Benefits

Technical Resources

Potential Implementation

The port from omni-mini-1 to omni-mini-2 should be relatively straightforward given the existing implementation. This could be a valuable addition to enhance the project's interactive capabilities.

Would this be a feature the maintainers are interested in adding to the project?

superFilicos commented 1 week ago

thank you!

mini-omni commented 1 week ago

@freddyaboulton hi, your demo looks really great. You are very welcome to contribute, and I would be happy to test it together. Thank you.