microsoft / call-center-ai

Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!
Apache License 2.0
281 stars 87 forks source link

Implement GPT-4o-realtime API for inbound voice #317

Closed clemlesne closed 1 week ago

clemlesne commented 1 month ago

Depends on https://github.com/microsoft/call-center-ai/issues/316 because Functions Apps cannot be consumed with a WebSocket protocol.

See:

clemlesne commented 1 week ago

The gpt-4o-realtime model is not reliable at all to detect languages, specifically languages that are not English. Plus, the hallucination rate is incredibly high.

So, I implemented manually the VAD algorithm with Azure AI Speech Real-time Transcription capabilities. It solves maturity issues from gpt-4o-realtime and allows cutting the bot voice, dramatically reducing latency. Plus, cost estimation is reduced by 40% compared to gpt-4o-realtime. Changes merged at 23503dc2ffdcabd4fdfa663bf06dcbfc0e91dbd0 in v14.0.0.

Note that VAD algorithm may require fine-tuning. PR appreciated.

Meanwhile, I close that issue. I may re-open it as gpt-4o-realtime gain in maturity.