Hello,
looking around I found an implementation in Java and Node about calculate tokens on functions / tools call.
I migrated the code in PHP, maybe useful adding that code on the project.
Attached a raw implementation in PHP of TikTokenUtils
<?php
namespace App\Core\Text;
use Yethee\Tiktoken\EncoderProvider;
/**
* Text Utils
*
* @author Denis
*/
class TikTokenUtils {
/**
* Count tokens
*
* @param string $text
* @param string $model
* @return int
*/
public static function tokens(string $text, $model = 'gpt-3.5-turbo'): int {
$provider = new EncoderProvider();
$encoder = $provider->getForModel($model);
$tokens = $encoder->encode($text);
return count($tokens);
}
/**
* Count Functions / Tools tokens
*
* @param string $text
* @param string $model
* @return int
*/
public static function functionsTokens(array $tools, $model = 'gpt-3.5-turbo'): int {
$tokens = 0;
if( !empty( $tools ) ) {
$tokens = self::tokens(self::formatFunctionDefinitions($tools));
$tokens += 9; // Additional tokens for function definition
}
return $tokens;
}
/**
* Format Function Definitions as TypeScript
* OpenAI appears to be turning the function definitions into TypeScript type definitions.
*
* Migrated from https://github.com/forestwanglin/openai-java/blob/main/jtokkit/src/main/java/xyz/felh/openai/jtokkit/utils/FunctionFormat.java
*
* This code return a tools format definition converted to TypeScript
*/
public static function formatFunctionDefinitions($tools) {
$lines = array();
$lines[] = "namespace functions {";
$lines[] = "";
foreach ($tools as $tool) {
if(!empty($tool['function']['description'])) {
$lines[] = sprintf("// %s", $tool['function']['description']);
}
if (!empty($tool['function']['parameters']['properties'])) {
$lines[] = sprintf("type %s = (_: {", $tool['function']['name']);
$lines[] = self::formatObjectProperties($tool['function']['parameters']['properties'], 0);
$lines[] = "}) => any;";
} else {
$lines[] = sprintf("type %s = () => any;", $tool['function']['name']);
}
$lines[] = "";
}
$lines[] = "} // namespace functions";
return implode("\n", $lines);
}
/**
* Convert properties to TypeScript
*
* @param $properties
* @param $indent
* @return array|string
*/
public static function formatObjectProperties($properties, $indent) {
if (empty($properties)) {
return "";
}
$requiredParams = array();
if(!empty( $properties["required"] )) {
$requiredParams = $properties["required"];
}
$lines = array();
foreach ($properties as $name => $property) {
if (!empty($property["description"]) && $indent < 2) {
$lines[] = sprintf("// %s", $property["description"]);
}
if (in_array($name, $requiredParams)) {
$lines[] = sprintf("%s: %s,", $name, self::formatType($property, $indent));
}
else{
$lines[] = sprintf("%s?: %s,", $name, self::formatType($property, $indent));
}
}
return implode("\n", array_map(function ($it) use ($indent) {
return str_repeat(" ", max(0, $indent)) . $it;
}, $lines));
}
/**
* Format single property type to TypeScript
*
* @param $property
* @param $indent
* @return string
*/
public static function formatType($property, $indent) {
$type = $property["type"];
switch ($type) {
case "string":
if (!empty($property["enum"])) {
return implode(" | ", array_map(function ($it) {
return sprintf("\"%s\"", $it);
}, $property["enum"]));
}
return "string";
case "array":
if (!empty($property["items"])) {
return sprintf("%s[]", self::formatType($property["items"], $indent));
}
return "any[]";
case "object":
return sprintf("{\n%s\n}", self::formatObjectProperties($property, $indent + 2));
case "integer":
case "number":
if (!empty($property["enum"])) {
return implode(" | ", array_map(function ($it) {
return sprintf("\"%s\"", $it);
}, $property["enum"]));
}
return "number";
case "boolean":
return "boolean";
case "null":
return "null";
default:
return "";
}
}
}
Example of how to use:
$tools = [
[
'type' => 'function',
'function' => [
'name' => 'get_flight_status',
'description' => 'Get the status of a flight by its flight number. The answer must always provide the coming_from, airline, flight_status, estimated_arrival_time and delayed_arrival_time when not empty.',
'parameters' => [
'type' => 'object',
'properties' => [
'flight_number' => [
'type' => 'string',
'description' => 'The Flight Number, MUST respect this pattern: 2 letters, and 5 numbers and may contain spaces; eg: BA00576',
],
'day' => [
'type' => 'string',
'description' => 'The day of the flight',
],
],
'required' => ['flight_number'],
],
],
],
[
'type' => 'function',
'function' => [
'name' => 'get_time',
'description' => 'Get the current time.',
],
]
];
//get number of token for tools
echo TikTokenUtils::functionsTokens($tools);
Hello, looking around I found an implementation in Java and Node about calculate tokens on functions / tools call. I migrated the code in PHP, maybe useful adding that code on the project.
I started from this thread:
https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/10
and I take code from these repos and migrated to raw PHP:
https://github.com/forestwanglin/openai-java/blob/72d7bfc8ffb1bfb810b99518d0f99110e3204227/jtokkit/src/main/java/xyz/felh/openai/jtokkit/utils/TikTokenUtils.java#L392
https://github.com/hmarr/openai-chat-tokens/blob/main/src/functions.ts (look this blog https://hmarr.com/blog/counting-openai-tokens/)
Attached a raw implementation in PHP of TikTokenUtils
Example of how to use: