viswavi commented 1 year ago

Description

Issue #353 found that the LiteLLM agent is now reporting that our model's request is exceeding the maximum allowable number of tokens.

This problem stems from the fact that there's a disparity between tiktoken's tokenizer counts and: 1) the number of tokens that OpenAI's API perceives 2) the number of tokens that OpenAI's tokenizer playground perceives

For the full prompt parser prompt, tiktoken says there's 2569 tokens, so we set max_tokens for LiteLLM to be 4097 - 2569 = 1528 However, OpenAI's API perceives there to be 2576 tokens, which exceeds the 4097 limit, while OpenAI's tokenizer thinks there's 2862 tokens.

A naive solution here is to give a buffer; e.g. generate 300 fewer tokens than the maximum limit (so we would set max_tokens to be 1228 instead of 1528). This PR implements that solution.

References

N/A

Blocked by

N/A

krrishdholakia commented 1 year ago

@viswavi @neubig would it have been helpful here to have the token counter expose a buffer param, so you could've just added a manual buffer there?

ideally tiktoken would 1:1 map the api

neubig commented 1 year ago

Hmm, it sounds interesting but I'm not exactly sure what you mean by this?

krrishdholakia commented 1 year ago

Nvm - looks like i had an incorrect understanding of the problem. I thought you were using our token counting helper function

but it looks like it was being read from the response object.

neulab / prompt2model

Add buffer in the maximum number of tokens generated (to fix #353) #354

Description

References

Blocked by