Closed viswavi closed 1 year ago
@viswavi @neubig would it have been helpful here to have the token counter expose a buffer param, so you could've just added a manual buffer there?
ideally tiktoken would 1:1 map the api
Hmm, it sounds interesting but I'm not exactly sure what you mean by this?
Nvm - looks like i had an incorrect understanding of the problem. I thought you were using our token counting helper function
but it looks like it was being read from the response object.
Description
Issue #353 found that the LiteLLM agent is now reporting that our model's request is exceeding the maximum allowable number of tokens.
This problem stems from the fact that there's a disparity between
tiktoken
's tokenizer counts and: 1) the number of tokens that OpenAI's API perceives 2) the number of tokens that OpenAI's tokenizer playground perceivesFor the full prompt parser prompt, tiktoken says there's 2569 tokens, so we set max_tokens for LiteLLM to be 4097 - 2569 = 1528 However, OpenAI's API perceives there to be 2576 tokens, which exceeds the 4097 limit, while OpenAI's tokenizer thinks there's 2862 tokens.
A naive solution here is to give a buffer; e.g. generate 300 fewer tokens than the maximum limit (so we would set max_tokens to be 1228 instead of 1528). This PR implements that solution.
References
N/A
Blocked by
N/A