ubiquity / ubiquibot

Putting the 'A' in 'DAO'
https://github.com/marketplace/ubiquibot
MIT License
17 stars 61 forks source link

`/ask` Priority - Token Optimization #807

Open 0x4007 opened 1 year ago

0x4007 commented 1 year ago
! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

@Keyrxng time for compression/prioritization? Not a great first real world attempt lol.

Prioritization order:

  1. Current issue specification
  2. Linked issue specification (in order of linked, the first link taking higher priority than the next link)
  3. Current issue conversation
  4. Linked issue conversations (same ordering system)

We should use a tokenization estimator to know how much we should exclude.

Originally posted by @pavlovcik in https://github.com/ubiquity/ubiquibot/issues/787#issuecomment-1732315081


It should also include a warning that it had to cut out some content. Perhaps even including the exact tokens used etc similar to the information that was presented in the error message above for context to the user to approximate how much was cut off.

Keyrxng commented 1 year ago

you requested 20407 tokens (4023 in the messages, 16384 in the completion)

Token Limit is equivalent to GPT output, whatever you set to be the token limit is the maximum that GPT will respond with but that also has to include the input, bbut we determine the input so we can't set that really.

The py package tiktoken is the best tokenization optimization package, there is a ts wrapper for it otherwise it'll be a case of using langchain and creating our own textSplitters and basing our input tokens on that which will be a rough but close estimate

Keyrxng commented 1 year ago

This issue is a non-starter really my friend as it was user error this time around but i'll still take the bounty lmao ;))

0x4007 commented 1 year ago

I'll wait until we get some real world use cases functional before we optimize

Keyrxng commented 1 year ago

a crude workaround could be that if the response from gpt is an error message stating the token count and how much we are over by we can make an educated guess as to how many chars to strip from the context in order to meet the token limit set?

Another could be to use langchain to interact with openai and allow for -1 to be passed in for max_tokens

this.llm = new OpenAI({
            openAIApiKey: this.apiKey,
            modelName: 'gpt-3.5-turbo-16k',
            maxTokens: -1,
        })