openai / openai-realtime-api-beta

Node.js + JavaScript reference client for the Realtime API (beta)
MIT License
721 stars 190 forks source link

Assistant is very difficult to correct #33

Open mrkww opened 1 month ago

mrkww commented 1 month ago

I am "talking" to the realtime API via g711_alaw audio streaming (with server_vad).

I noticed that the assistant is very difficult to correct. It insists on an initial assumption and is hardly willing to deviate from it. I am talking in German, assistant talks in German. I translated the following examples for you.

Examples: I am introducing myself with "Hi my name is Mark, how are you?" A: "Hi Marco, I am fine." Me: "No, I am Mark." A: "Ok, hi Marco." Me: "My name is Mark." A: "Of course, Marco." Me: "Please correct yourself: my name is Mark" A: "Oh, of course. Marco."

I also noticed that when the assistant tries to end the conversation, it is no longer possible to get him away from this:

A: "Feel free to call me again. Bye Bye" Me: "No, I don't want to end the call." A: "Sure. Bye Bye and have a great day." Me: "No, please answer more questions." A: "It was nice talking to you. Goodbye."

Is this a known behaviour? Can this behaviour be adjusted somehow?

jandieg commented 1 month ago

For your second issue, probably you must make it clear in the prompt that the conversation is taking place in a call