trzy / ios-openai-realtime

Example of OpenAI's realtime GPT-4 API for iOS with SwiftUI.
16 stars 3 forks source link

Turn Detection #1

Closed Partyschnitzel closed 1 month ago

Partyschnitzel commented 1 month ago

Hello! Is there a way to use the turn detection of the realtime api?

Unfortunately, I can't manage to interrupt the assistant by voice input from the user without the assistant also understanding itself. However, there is turn detection in the real-time API. Any idea how to get this managed?

trzy commented 1 month ago

I’m not sure. I believe you have to filter out the response waveform that the microphone picks up. I thought there was a way to do this in iOS natively but I guess not? Sent from my iPhoneOn Nov 2, 2024, at 2:25 AM, Partyschnitzel @.***> wrote: Hello! Is there a way to use the turn detection of the realtime api? Unfortunately, I can't manage to interrupt the assistant by voice input from the user without the assistant also understanding itself. However, there is turn detection in the real-time API. Any idea how to get this managed?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

trzy commented 1 month ago

Ah wait a minute I think there is a way to enable this! Will try it later.Sent from my iPhoneOn Nov 2, 2024, at 10:23 AM, Bart T. @.> wrote:I’m not sure. I believe you have to filter out the response waveform that the microphone picks up. I thought there was a way to do this in iOS natively but I guess not? Sent from my iPhoneOn Nov 2, 2024, at 2:25 AM, Partyschnitzel @.> wrote: Hello! Is there a way to use the turn detection of the realtime api? Unfortunately, I can't manage to interrupt the assistant by voice input from the user without the assistant also understanding itself. However, there is turn detection in the real-time API. Any idea how to get this managed?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

trzy commented 1 month ago

Nope, I can't figure it out. Allegedly setting .voiceChat or .videoChat mode is supposed to enable active echo cancellation but I've tried a variety of different things and it just doesn't work. If you can figure it out, please share. I know there's a way to do this but I can't find any examples online of streaming mic input (which I think necessitates setting up an audio graph and a tap on the input node) plus playback.

trzy commented 1 month ago

Ok, I finally figured it out after all. Apple APIs are atrocious. Enjoy the update.

Partyschnitzel commented 1 month ago

Hey @trzy,

I had previously implemented a similar approach but ended up abandoning it because it didn't work in the simulator or on my Mac. Now, I've tried your update and you're a genius! However, just like my initial attempt, it only works on an actual device. Unfortunately, on the Mac, no sound is detected.

Great job figuring it out though!

trzy commented 1 month ago

Unfortunately I don’t know about how to make it work in the simulator or Mac but surely there is a way. I do recall when Googling that there were lots of people asking about voice processing on macOS and there were some solutions. If you figure that part out and can contribute a PR, I would much appreciate it 🙏