Open endolith opened 1 year ago
I am encountering the same error.
I'm getting the same error but it looks like the max_tokens data isn't even sent to the API
Is this the expected behavior? I don't see any changes when I edit the max_tokens setting.
I'm getting the same error but it looks like the max_tokens data isn't even sent to the API
Is this the expected behavior? I don't see any changes when I edit the max_tokens setting.
Max tokens is a property of each model, but isn't published through the API. I've asked them to add that https://github.com/openai/openai-python/issues/448
Also I think that the max_tokens should be the maximum model token (e.g. 16384 for gpt-3.5-turbo-16k) minus the previous messages length, minus some more safe "margin" tokens.
example: 16384 (max model) - 8985 (previous content) = 7399 (remaining max_tokens)
Unfortunately doing so will result in an error, so it's usually better to set 1% or 2% less tokens for max_tokens (I'd send 7300 for the example above).
I'm getting the same error but it looks like the max_tokens data isn't even sent to the API
Is this the expected behavior? I don't see any changes when I edit the max_tokens setting.
Max tokens is a property of each model, but isn't published through the API. I've asked them to add that openai/openai-python#448
I'm confused. It's already implemented in the api: https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens
I could copy the request from the browser inspect network tab (in curl format), set the max_tokens and run it in the cli terminal. Looks like it's working. I must be missing something...
I'm confused. It's already implemented in the api: platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens
Ah, that's the maximum number of tokens to generate, not the maximum supported by the model.
(Which I guess would actually be called context_length
?)
The token count of your prompt plus max_tokens cannot exceed the model's context length.
Context length VS Max token VS Maximum length
When BetterChatGPT is trying to auto-generate a title, it's feeding more tokens to the model than the model supports, producing this error.
The maximum context lengths for each GPT and embeddings model can be found in the model index.
(Though it is confusingly called "Max tokens" in the model index table.)
It is a bit confusing indeed but the max_tokens
parameter is never sent to begin with. It should be calculated and sent with each request like context_length
- content-tokens (for lack of better wording) = max_tokens
As I said it should probably be 1% or 2% less than that to avoid errors (I tried with a precise number and it gave me errors anyway)
So in summary... this parameter varies from call to call (i.e. the maximum range of the slider, should become smaller and smaller each time we send a request and get a response)
I get this every time, it's frustrating
This is fixed in my fork. unfortunately, I fixed it after fixing a lot more stuff to do with model context and max tokens (and detaching fork from parent), so I can't easily make a diff., but feel free to try to steal my implementation.
Either use the 16k model to generate the title, or just truncate the input (which should be good enough for generating a title)