openai / openai-realtime-console

React app for inspecting, building and debugging with the Realtime API
MIT License
2.04k stars 736 forks source link

Unexpected tokens used in initial rate_limits.updated #42

Open samreid opened 1 month ago

samreid commented 1 month ago

On startup, I consistently see nearly 5000 tokens used on "connect". I commented out both addTool calls and set the instructions to 'test', yet I still see output like this on "connect":

00:01.33
server
rate_limits.updated
{
  "type": "rate_limits.updated",
  "event_id": "event_AFWylkC7LwdlyxIrYCHCt",
  "rate_limits": [
    {
      "name": "requests",
      "limit": 5000,
      "remaining": 4999,
      "reset_seconds": 0.012
    },
    {
      "name": "tokens",
      "limit": 20000,
      "remaining": 15482,
      "reset_seconds": 13.554
    }
  ]
}

Observe that the remaining is 15482/20000. Is this to be expected?

Testing with 971323d7f81b42f14177741e4c8666bbe4591c1c on macbook air m1 in chrome Version 129.0.6668.90 (Official Build) (arm64).

Thanks!

UPDATE: I'm testing with the in-browser implementation, not the relay server.

khorwood-openai commented 1 month ago

Hey there, can you provide your session ID for any sessions that you encounter this error with? Should be in the session.created event.

samreid commented 1 month ago

Yes, I just tested it again and had similar behavior. Here is the beginning of that session ID:

{
  "type": "session.created",
  "event_id": "event_AFohAvtiudEvVIO2crDBu",
  "session": {
    "id": "sess_AFoh9g0xJK5jqcwxMWOHz",
    "object": "realtime.session",
    "model": "gpt-4o-realtime-preview-2024-10-01",
    "expires_at": 1728334055,

The rate limits came out like:

    {
      "name": "tokens",
      "limit": 20000,
      "remaining": 14989,
      "reset_seconds": 15.033
    }

This run does have some addTool calls and a paragraph for the conversation instructions.

dnakov commented 1 month ago

It's the max_response_tokens, it seems like it "reserves" those

bakks commented 1 month ago

@dnakov is correct - this is like a "reservation" rather than an immediate consumption. It's ~5000 because we're reserving 4096, the max model output size. I'm going to change this behavior to be more forgiving -- it should give you more headroom on the rate limits. Expect an improvement tomorrow.

kyleboddy commented 1 month ago

Did this ship @bakks? I've noticed this double counting / reserving costing far more than the OpenAI initial estimates from their models pages.