Implement GPT-4o-realtime API for inbound voice

microsoft / call-center-ai

Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

Apache License 2.0

281 stars 87 forks source link

The gpt-4o-realtime model is not reliable at all to detect languages, specifically languages that are not English. Plus, the hallucination rate is incredibly high.

So, I implemented manually the VAD algorithm with Azure AI Speech Real-time Transcription capabilities. It solves maturity issues from gpt-4o-realtime and allows cutting the bot voice, dramatically reducing latency. Plus, cost estimation is reduced by 40% compared to gpt-4o-realtime. Changes merged at 23503dc2ffdcabd4fdfa663bf06dcbfc0e91dbd0 in v14.0.0.

Note that VAD algorithm may require fine-tuning. PR appreciated.

Meanwhile, I close that issue. I may re-open it as gpt-4o-realtime gain in maturity.

microsoft / call-center-ai

Implement GPT-4o-realtime API for inbound voice #317