pkoukk / tiktoken-go

go version of tiktoken
MIT License
601 stars 67 forks source link

Is there support for function calls tokens #39

Open KilianKae opened 10 months ago

KilianKae commented 10 months ago

Thanks for the project. Great so far.

Is there support for counting tokens when using function calls? https://platform.openai.com/docs/guides/gpt/function-calling

pkoukk commented 9 months ago

I'm sorry, open ai does not provide any explanation on how to calculate function call costs, so we currently do not know how to calculate it either.

phillebaba commented 9 months ago

The Java implementation of Tiktoken seems to have a util for calculating tokens for functions. https://github.com/forestwanglin/openai-java/blob/93596f43e684b30f8712f9ba62edcef92d3f8a9b/jtokkit/src/main/java/xyz/felh/openai/jtokkit/utils/TikTokenUtils.java#L213-L254

I have no idea where they got this solution from however.

pkoukk commented 9 months ago

The Java implementation of Tiktoken seems to have a util for calculating tokens for functions. https://github.com/forestwanglin/openai-java/blob/93596f43e684b30f8712f9ba62edcef92d3f8a9b/jtokkit/src/main/java/xyz/felh/openai/jtokkit/utils/TikTokenUtils.java#L213-L254

I have no idea where they got this solution from however.

I tried a few simple examples following the calculation method in this code snippet, but the computed values were inconsistent with the usage amount returned by the API.

The following examples can serve as evidence:

In this example, API response prompt_tokens usage is 43 { "model": "gpt-3.5-turbo-16k-0613", "messages": [ { "role": "assistant", "content": "Hello!" } ], "functions": [ { "name": "test", "description": "test", "parameters": { "type": "object", "properties": { "test_string": { "type": "string" } } } } ] }

In this example, API response prompt_tokens usage is 35. { "model": "gpt-3.5-turbo-16k-0613", "messages": [ { "role": "assistant", "content": "Hello!" } ], "functions": [ { "name": "test", "description": "test", "parameters": { "type": "object", "properties": { "test_string": { "type": "object" } } } } ] }

The only difference between the above two examples is that the type of test_string changed from string to object, but the consumption decreased by 8.

The number of tokens for the literal "string" and "object" should both be 1, and should not lead to a discrepancy.

The token consumption appears to need to be calculated in combination with the validity of the schema. Without OpenAI disclosing the rules, it is really difficult to infer through experimentation.

pkoukk commented 9 months ago

However, there are at least some basic rules that are certain.

We can provide an approximate calculation method to calculate usage amount when invoking the API.