Token counts are incorrect when request includes function messages

zurawiki / tiktoken-rs

Ready-made tokenizer library for working with GPT and tiktoken

MIT License

235 stars 45 forks source link

When using the relatively recent "Functions" feature of the ChatGPT API, it seems like tiktoken-rs underestimates the total number of tokens in the request. Here's a minimal example request:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a friendly chatbot.\n"
    },
    {
      "role": "assistant",
      "content": "Hello, I am a friendly chatbot!\n"
    },
    {
      "role": "user",
      "content": "What is the weather in New York?"
    },
    {
      "content": "",
      "function_call": {
        "arguments": "{\n  \"city\": \"New York\"\n}",
        "name": "get_weather"
      },
      "role": "assistant"
    },
    {
      "role": "function",
      "name": "get_weather",
      "content": "{\"temperature\": 72, \"conditions\": \"partly_cloudy\"}"
    }
  ],
  "model": "gpt-4-0613",
  "temperature": 0,
  "stream": false
}

I get this response from OpenAI:

{
    // ...
    "usage": {
        "prompt_tokens": 78,
        "completion_tokens": 19,
        "total_tokens": 97
    }
}

...indicating the request consumed 78 tokens for the prompt. However, tiktoken_rs::num_tokens_from_messages returns a value of 66 tokens.

zurawiki / tiktoken-rs

Token counts are incorrect when request includes function messages #40