ztjhz / BetterChatGPT

An amazing UI for OpenAI's ChatGPT (Website + Windows + MacOS + Linux)
http://bettergpt.chat/
Creative Commons Zero v1.0 Universal
8.16k stars 2.76k forks source link

`context_length_exceeded` when generating title #377

Open endolith opened 1 year ago

endolith commented 1 year ago
Error generating title!
{
  "error": {
    "message": "This model's maximum context length is 4097 tokens. However, your messages resulted in 7403 tokens. Please reduce the length of the messages.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}

Either use the 16k model to generate the title, or just truncate the input (which should be good enough for generating a title)

XOKP commented 1 year ago

I am encountering the same error.

niccolofavari commented 1 year ago

I'm getting the same error but it looks like the max_tokens data isn't even sent to the API

image

Is this the expected behavior? I don't see any changes when I edit the max_tokens setting.

endolith commented 1 year ago

I'm getting the same error but it looks like the max_tokens data isn't even sent to the API

image

Is this the expected behavior? I don't see any changes when I edit the max_tokens setting.

Max tokens is a property of each model, but isn't published through the API. I've asked them to add that https://github.com/openai/openai-python/issues/448

niccolofavari commented 1 year ago

Also I think that the max_tokens should be the maximum model token (e.g. 16384 for gpt-3.5-turbo-16k) minus the previous messages length, minus some more safe "margin" tokens.

example: 16384 (max model) - 8985 (previous content) = 7399 (remaining max_tokens)

Unfortunately doing so will result in an error, so it's usually better to set 1% or 2% less tokens for max_tokens (I'd send 7300 for the example above).

niccolofavari commented 1 year ago

I'm getting the same error but it looks like the max_tokens data isn't even sent to the API

image

Is this the expected behavior? I don't see any changes when I edit the max_tokens setting.

Max tokens is a property of each model, but isn't published through the API. I've asked them to add that openai/openai-python#448

I'm confused. It's already implemented in the api: https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens

I could copy the request from the browser inspect network tab (in curl format), set the max_tokens and run it in the cli terminal. Looks like it's working. I must be missing something...

endolith commented 1 year ago

I'm confused. It's already implemented in the api: platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens

Ah, that's the maximum number of tokens to generate, not the maximum supported by the model.

(Which I guess would actually be called context_length?)

The token count of your prompt plus max_tokens cannot exceed the model's context length.

Context length VS Max token VS Maximum length

When BetterChatGPT is trying to auto-generate a title, it's feeding more tokens to the model than the model supports, producing this error.

The maximum context lengths for each GPT and embeddings model can be found in the model index.

(Though it is confusingly called "Max tokens" in the model index table.)

niccolofavari commented 1 year ago

It is a bit confusing indeed but the max_tokens parameter is never sent to begin with. It should be calculated and sent with each request like context_length - content-tokens (for lack of better wording) = max_tokens

As I said it should probably be 1% or 2% less than that to avoid errors (I tried with a precise number and it gave me errors anyway)

So in summary... this parameter varies from call to call (i.e. the maximum range of the slider, should become smaller and smaller each time we send a request and get a response)

endolith commented 1 year ago

image

I get this every time, it's frustrating

jackschedel commented 1 year ago

This is fixed in my fork. unfortunately, I fixed it after fixing a lot more stuff to do with model context and max tokens (and detaching fork from parent), so I can't easily make a diff., but feel free to try to steal my implementation.