tiktoken example notebook returns incorrect token counts for chat APIs

RossBencina commented 1 year ago

Identify the file to be fixed The name of the file containing the problem: How_to_count_tokens_with_tiktoken.ipynb

Describe the problem The problem is that the supplied example code for computing token counts for chat messages appears to be off by 1 (low) for each message. The numbers returned by num_tokens_from_messages() did not match those returned by the API endpoint. The problem is the same with both the gpt-3.5-turbo and gpt-4 endpoints even though these are separate code paths.

Describe a solution By trial and error I made the following changes on the two lines with the WAS comments:

    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 5 # WAS 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif model == "gpt-4-0314":
        tokens_per_message = 4 # WAS 3
        tokens_per_name = 1

With the above changes I get the correct token counts for both chat endpoints.

May I suggest that the tiktoken library itself handle the details of knowing the chat wrapper encoding?

Additional context I tried to get tiktoken to encode the message wrappers to compute the actual token overhead using:

encoding.encode("<|im_start|>system\n<|im_end|>\n", allowed_special="all")
and
encoding.encode("<|start|>system\n<|end|>\n", allowed_special="all")

But tiktoken does not understand those special start/end tokens.

ted-at-openai commented 1 year ago

I see a match when I run the notebook code:

What do you think explains the difference between your results and mine?

quzard commented 1 year ago

我在运行笔记本代码时看到匹配项：

你认为什么可以解释你的结果和我的结果之间的差异？

I would like to ask, is the tiktoken calculation method of the 0613 model the same as the 0301 model?

averykhoo commented 1 year ago

0613 seems to be different from 0301, created MR #511

RossBencina commented 1 year ago

@ted-at-openai wrote:

What do you think explains the difference between your results and mine?

I now think it was a bug in my code.

I have retested with de3bd58 and the code in the notebook appears to be working correctly for me when specifying any of the following models:

model = "gpt-3.5-turbo-0301"
model = "gpt-3.5-turbo-0613"
model = "gpt-4"

However if I specify model = "gpt-3.5-turbo" there is a a discrepancy that can be explained by the following: the token counts default to using 0613:

Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.

but the API is currently using 0301. I get a response from the API with "model": "gpt-3.5-turbo-0301" even though I passed in gpt-3.5-turbo.

Given that the console message is flagging the issue, I guess the notebook code is correct enough and I will get 0613 as default at some point.

It is unfortunate that there is not a rock-solid way to get the token count in advance. I am using it to adjust max_tokens to avoid getting errors if I request more tokens than are remaining in the window. The fact that there is no code for computing token counts for the functions API is going to make things worse. Perhaps I'm looking at this the wrong way and there's some other way to saturate the context window?

ted-at-openai commented 1 year ago

One piece of good news: max_tokens is optional for ChatCompletion requests. https://platform.openai.com/docs/api-reference/chat/create

openai / openai-cookbook

tiktoken example notebook returns incorrect token counts for chat APIs #488