openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12k stars 818 forks source link

No response when more than 32k tokens requested #221

Closed fanpeeps closed 10 months ago

fanpeeps commented 10 months ago

When len(string) > 32k, this produces no response (num_tokens = None):

    encoding = tiktoken.get_encoding("cl100k_base")
    num_tokens = len(encoding.encode(string))

Does encoder have a # tokens limit?

hauntsaninja commented 10 months ago

This works just fine for me:

import tiktoken
s = "a b c" * 32000
encoding = tiktoken.get_encoding("cl100k_base")
print("num_tokens", len(encoding.encode(s)))

Feel free to re-open if you have a clear repro