twilio-samples / speech-assistant-openai-realtime-api-node

MIT License
120 stars 57 forks source link

vad version is unusable #19

Open kjjd84 opened 1 day ago

kjjd84 commented 1 day ago

the recent vad patch is nearly unusable

the ai is constantly interrupted by nothing, the call quality is now extremely static-ridden

i am using google text to speech to get a transcript of what the person is saying, and often times it is picking up words that are not even said, like the static is making the app think someone is talking

im not sure whats going on, but i can no longer even consider this for a real application

pkamp3 commented 5 hours ago

OpenAI provided a few options for server_vad, you might try these first (see turn_detection). Check activation_threshold first, but silence_duration_ms will factor in as well.

Beyond that, VAD is complicated – it's not something we could robustly address in this code. If you turn off the OpenAI version, you might try a third party provider, or start by manually interrupting based on the Google TTS transcript (use response.cancel and the new code which demonstrates conversation truncation).