yethee / tiktoken-php

This is a port of the tiktoken
MIT License
92 stars 19 forks source link

tiktoken-php

Packagist Version Build status codecov License

This is a port of the tiktoken.

Installation

$ composer require yethee/tiktoken

Usage


use Yethee\Tiktoken\EncoderProvider;

$provider = new EncoderProvider();

$encoder = $provider->getForModel('gpt-3.5-turbo-0301');
$tokens = $encoder->encode('Hello world!');
print_r($tokens);
// OUT: [9906, 1917, 0]

$encoder = $provider->get('p50k_base');
$tokens = $encoder->encode('Hello world!');
print_r($tokens);
// OUT: [15496, 995, 0]

Cache

The encoder uses an external vocabularies, so caching is used by default to avoid performance issues.

By default, the directory for temporary files is used. You can override the directory for cache via environment variable TIKTOKEN_CACHE_DIR or use EncoderProvider::setVocabCache():

use Yethee\Tiktoken\EncoderProvider;

$encProvider = new EncoderProvider();
$encProvider->setVocabCache('/path/to/cache');

// Using the provider

Disable cache

You can disable the cache, if there are reasons for this, in one of the following ways:

use Yethee\Tiktoken\EncoderProvider;

$encProvider = new EncoderProvider();
$encProvider->setVocabCache(null); // disable the cache

Limitations

License

MIT