Closed clemlesne closed 1 week ago
The gpt-4o-realtime
model is not reliable at all to detect languages, specifically languages that are not English. Plus, the hallucination rate is incredibly high.
So, I implemented manually the VAD algorithm with Azure AI Speech Real-time Transcription capabilities. It solves maturity issues from gpt-4o-realtime
and allows cutting the bot voice, dramatically reducing latency. Plus, cost estimation is reduced by 40% compared to gpt-4o-realtime
. Changes merged at 23503dc2ffdcabd4fdfa663bf06dcbfc0e91dbd0 in v14.0.0
.
Note that VAD algorithm may require fine-tuning. PR appreciated.
Meanwhile, I close that issue. I may re-open it as gpt-4o-realtime
gain in maturity.
Depends on https://github.com/microsoft/call-center-ai/issues/316 because Functions Apps cannot be consumed with a WebSocket protocol.
See: