openai / openai-realtime-api-beta

Node.js + JavaScript reference client for the Realtime API (beta)
MIT License
571 stars 122 forks source link

Instruction-following degrades after several minutes in a session #40

Open needsmorejpeg opened 2 weeks ago

needsmorejpeg commented 2 weeks ago

I've noticed that when testing out my assistant, it tends to do a great job following the instructions in my system prompt for the first 1-2 minutes of a call. Then, as we start getting to minutes 3-5, it will make mistakes that are specifically addressed in the prompt. Has anyone else noticed the performance get worse over time?

I'm wondering if this has to do with how the context / state is managed internally to the API. I assume the system prompt is always maintained, while the audio stream starts dropping old audio (in a FIFO manner)?

I see this in the API docs:

If a conversation goes on for a sufficiently long time, the input tokens the conversation represents may exceed the model’s input context limit (e.g. 128k tokens for GPT-4o). At this point, the Realtime API automatically truncates the conversation based on a heuristic-based algorithm that preserves the most important parts of the context (system instructions, most recent messages, and so on.) This allows the conversation to continue uninterrupted.

And just want to confirm that the "heuristic-based algorithm" will always include the system instructions. Any extra detail you can provide is helpful, too!

mrkww commented 1 week ago

I noticed similar behavior already after a couple of questions. In the beginning, the assistant is absolutely on point, but it significantly degrades after a few ping pongs... it even says "Goodbye" out of nowhere.