twilio-samples / speech-assistant-openai-realtime-api-node

MIT License
164 stars 94 forks source link

vad version is unusable #19

Closed kjjd84 closed 1 week ago

kjjd84 commented 1 month ago

the recent vad patch is nearly unusable

the ai is constantly interrupted by nothing, the call quality is now extremely static-ridden

i am using google text to speech to get a transcript of what the person is saying, and often times it is picking up words that are not even said, like the static is making the app think someone is talking

im not sure whats going on, but i can no longer even consider this for a real application

pkamp3 commented 1 month ago

OpenAI provided a few options for server_vad, you might try these first (see turn_detection). Check activation_threshold first, but silence_duration_ms will factor in as well.

Beyond that, VAD is complicated – it's not something we could robustly address in this code. If you turn off the OpenAI version, you might try a third party provider, or start by manually interrupting based on the Google TTS transcript (use response.cancel and the new code which demonstrates conversation truncation).

pkamp3 commented 1 week ago

Closing as this is out of scope.