langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.96k stars 14.92k forks source link

Tokenize before OpenAI call issues #6642

Closed paulthemagno closed 1 year ago

paulthemagno commented 1 year ago

Issue you'd like to raise.

I would like to know how many tokens the tokenizer would generate for the prompt doing the OpenAI call, but I'm finding issues in reproducing the real call. Indeed I'm trying two methods (that internally use tiktoken library if I'm not wrong) found in the documentation:

Then I check the number of prompt tokens with the callback get_openai_callback the understand if the calculation was correct:

from langchain.llms import OpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.callbacks import get_openai_callback

models_name = ["text-davinci-003", "gpt-3.5-turbo-0301", "gpt-3.5-turbo-0613"]
for model_name in models_name:
    print(f"----{model_name}----")
    llm = OpenAI(model_name = model_name)
    print(llm)
    text = "Hello world"
    tokens = llm.get_num_tokens(text)
    print(f"1) get_num_tokens: {tokens}")

    human_message = HumanMessage(content=text)
    system_message = SystemMessage(content=text)
    ai_message = AIMessage(content=text)
    tokens = llm.get_num_tokens_from_messages([human_message]), llm.get_num_tokens_from_messages([system_message]), llm.get_num_tokens_from_messages([ai_message])
    print(f"2) get_num_tokens_from_messages: {tokens}")

    with get_openai_callback() as cb: 
        llm_response = llm(text)
        print(f"3) callback: {cb}")

The output is:

----text-davinci-003----
OpenAI
Params: {'model_name': 'text-davinci-003', 'temperature': 0.7, 'max_tokens': 256, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'request_timeout': None, 'logit_bias': {}}
1) get_num_tokens: 2
2) get_num_tokens_from_messages: (4, 4, 4)
3) callback: Tokens Used: 23
    Prompt Tokens: 2
    Completion Tokens: 21
Successful Requests: 1
Total Cost (USD): $0.00045999999999999996
----gpt-3.5-turbo-0301----
OpenAIChat
Params: {'model_name': 'gpt-3.5-turbo-0301'}
1) get_num_tokens: 2
2) get_num_tokens_from_messages: (4, 4, 4)
3) callback: Tokens Used: 50
    Prompt Tokens: 10
    Completion Tokens: 40
Successful Requests: 1
Total Cost (USD): $0.0001
----gpt-3.5-turbo-0613----
OpenAIChat
Params: {'model_name': 'gpt-3.5-turbo-0613'}
1) get_num_tokens: 2
2) get_num_tokens_from_messages: (4, 4, 4)
3) callback: Tokens Used: 18
    Prompt Tokens: 9
    Completion Tokens: 9
Successful Requests: 1
Total Cost (USD): $0.0

I understand that each model has a different way to count the tokens, for example text-davinci-003 has the same number between get_num_tokens output and the callback. The other two models: gpt-3.5-turbo-0301 and gpt-3.5-turbo-0613 seems to have respectively 6 and 5 tokens more in the callback compared to get_num_tokens_from_messages.

So how I can reproduce exactly the calculation of the token in the real call? Which is the official function used in it?

Suggestion:

No response

vowelparrot commented 1 year ago

One of the reasons this is wrong is that you're using the completions LLM from langchain.llms import OpenAI with a chat model. Should be from langchain.chat_models import ChatOpenAI.

That's why there's a user warning

You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`

It still is surprising that the different versions of turbo would have different promtp token lengths given they're supposed to use the same encoder https://github.com/openai/tiktoken/blob/main/tiktoken/model.py

paulthemagno commented 1 year ago

Yes, you're right I tried with from langchain.chat_models import ChatOpenAI since I read the deprecation log. Indeed the get_num_tokens_from_messages returns correctly the number of the tokens now. Anyway as you said it seems that gpt-3.5-turbo-0613 has 1 token less than gpt-3.5-turbo-0301, but the method in this case return the same tokens for the two models.

Moreover it seesm that the __call__ function works differently in ChatOpenAI, indeed I did the call with:

llm = ChatOpenAI(model_name = model_name)
response = llm.generate([[HumanMessage(content=text)]]).generations[0][0].text

looking from the doc and trying the script, this was for me the simplest solution.