Open vincentwi opened 1 month ago
The conversation.interrupted
event is just an event -- by default it doesn't trigger any method, just something you can listen to. I would just avoid using the cancelResponse
method. In server_vad
mode the voice detection is automatic (happens server-side, not client-side) -- the client code is only to truncate audio or manually interrupt.
You can disabled server_vad
mode and you should be good. Let me know?
Hello, I have a case where I need an assistant to listen to the conversation in time and extract some specific information. Because it's listening to a conversation, the assistant needs to have server_vad
enabled. However, the assistant's response gets interrupted every time, so there is no useful output.
Ideally, I would need an assistant who would listen constantly and asynchronously(simultaneously) extract information from the discussion when it appears.
Any idea how this could be achieved?
we ar enot cancelling the repsonse but we continuously keep sending audio because the person is continuously speaking and we just want the translated repsonse as they talk.. but the API STOPS processing the audio when its inturrpupted we dont get the full audio translation.. i am not folowing the statement: "The conversation.interrupted event is just an event " ..we do see those events but we are NOT calling cancelResponse anywhere....but thew llm does stop sending an audio repsonse to those audp streams we append oni server_vad mode for..
yes, overdubbng live translation is another good example of a scenario where it is ok for AI to produce speech simultaneously. on top of the ones identified in my first comment, dynamic computer-control, comedy improv, NotebookLM style podcasts, film/reels/media voiceover, twitch commentary, etc are more use cases.
@khorwood-openai disabling server_vad
did not solve the problem, nor did omitting cancelResponse
. I think @ryanmmmmm and I and dealing with the problem where we are continuously uploading ~1-5s input chunks via response.create
, and the server won't process them simultaneously. Turn detection seems built in based on silence/pauses.
Hoping your team may have suggestions on how to get the audio processed asyncronously rather than putting clips on a queue.
Hi, I would like to request adding an interruption bool in updateSession. The expected behavior would be to toggle (on/off) the interruption mechanism of the assistant when the user speaks over them. This is particularly useful in scenarious where it is ok for both several agents to be produce speech simultaneously (e.g. split-screen sports broadcasting, poetry/singing over music, game characters exclaiming, commentary on a debate, etc).
It should be pretty simple to implement: 1) a flag to comment out this line https://github.com/openai/openai-realtime-api-beta/blob/d7bf27b842638f01c0d07d517d0d8a1b9ce4b63b/lib/client.js#L323 2) refrain from using cancelResponse in client code, e.g. from real-time-console: https://github.com/openai/openai-realtime-console/blob/6ea4dba795fee868c60ea9e8e7eba7469974b3e9/src/pages/ConsolePage.tsx#L238