Integrate GPT 4o without TTS/STT

microsoft / call-center-ai

Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

Apache License 2.0

197 stars 62 forks source link

Integrate GPT 4o without TTS/STT #210

Open clemlesne opened 4 months ago

clemlesne commented 4 months ago

OpenAI GPT 4o model supports both in and out of text, image and audio. Understanding is finer than usual STT > model > TTS approach because the model has direct access to user behavior, emotions, etc.

Is there a way to use Communication Services and receive the raw audio flow, bypassing the STT step?

### Tasks
- [ ] https://github.com/microsoft/call-center-ai/issues/316
- [ ] https://github.com/microsoft/call-center-ai/issues/317

Qwatro55 commented 4 months ago

I'm also interested in this question.

agentverket commented 4 months ago

What about response time? What about costs? Can you stream data?

clemlesne commented 4 months ago

I know I know :) OpenAI APIs are not yet available:

Plus, Communication Services APIs are not yet available to use with raw audio stream.

If you have ideas, don't hesitate!

JunJD commented 2 months ago

clemlesne commented 1 week ago

Audio streaming is now available with Communication Services!

https://learn.microsoft.com/en-us/azure/communication-services/how-tos/call-automation/audio-streaming-quickstart?pivots=programming-language-python

ngoanpv commented 5 days ago

Realtime API now support speech to speech from OpenAI. https://platform.openai.com/docs/guides/realtime/overview

I would like to explore more and add this feature to this project @clemlesne

clemlesne commented 4 days ago

We're working on it!

Clémence is working