Open samreid opened 1 month ago
Hey there, can you provide your session ID for any sessions that you encounter this error with? Should be in the session.created
event.
Yes, I just tested it again and had similar behavior. Here is the beginning of that session ID:
{
"type": "session.created",
"event_id": "event_AFohAvtiudEvVIO2crDBu",
"session": {
"id": "sess_AFoh9g0xJK5jqcwxMWOHz",
"object": "realtime.session",
"model": "gpt-4o-realtime-preview-2024-10-01",
"expires_at": 1728334055,
The rate limits came out like:
{
"name": "tokens",
"limit": 20000,
"remaining": 14989,
"reset_seconds": 15.033
}
This run does have some addTool
calls and a paragraph for the conversation instructions.
It's the max_response_tokens, it seems like it "reserves" those
@dnakov is correct - this is like a "reservation" rather than an immediate consumption. It's ~5000 because we're reserving 4096, the max model output size. I'm going to change this behavior to be more forgiving -- it should give you more headroom on the rate limits. Expect an improvement tomorrow.
Did this ship @bakks? I've noticed this double counting / reserving costing far more than the OpenAI initial estimates from their models pages.
On startup, I consistently see nearly 5000 tokens used on "connect". I commented out both
addTool
calls and set theinstructions
to 'test', yet I still see output like this on "connect":Observe that the remaining is 15482/20000. Is this to be expected?
Testing with 971323d7f81b42f14177741e4c8666bbe4591c1c on macbook air m1 in chrome Version 129.0.6668.90 (Official Build) (arm64).
Thanks!
UPDATE: I'm testing with the in-browser implementation, not the relay server.