openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.76k stars 801 forks source link

Add new types to TiktokenModel #148

Closed phil-willowtree closed 1 year ago

phil-willowtree commented 1 year ago

As of June 13, 2023, new models are available. They need to be added to the Tiktokenmodel type: gpt-4-0613 gpt-4-32k-0613 gpt-3.5-turbo-0613 gpt-3.5-turbo-16k

Hoozia commented 1 year ago

When will it be added?

jonathanlal commented 1 year ago

is the token count that different between models?

hauntsaninja commented 1 year ago

The public API here is encoding_for_model. With tiktoken 0.4 (released more than a month ago):

λ python
Python 3.11.4 (main, Jun  9 2023, 20:01:46) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tiktoken import encoding_for_model
>>> encoding_for_model("gpt-4-0613")
<Encoding 'cl100k_base'>
>>> encoding_for_model("gpt-4-32k-0613")
<Encoding 'cl100k_base'>
>>> encoding_for_model("gpt-3.5-turbo-0613")
<Encoding 'cl100k_base'>
>>> encoding_for_model("gpt-3.5-turbo-16k")
<Encoding 'cl100k_base'>