openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.61k stars 784 forks source link

Improve `tiktoken.encoding_for_model` with better option #185

Closed xhluca closed 6 months ago

xhluca commented 11 months ago

tiktoken.encoding_for_model is a convenient function for automatically selecting models. However, the following will result in an error:

tiktoken.encoding_for_model("gpt-2")
tiktoken.encoding_for_model("gpt-3.5")

This is because the only available options right now are:

tiktoken.encoding_for_model("gpt2")  # Notice the lack of dash
tiktoken.encoding_for_model("gpt-3.5-turbo")  # gpt-3.5 is a common shorthand here

This PR will allow the use of "gpt-2" and "gpt-3.5"

hauntsaninja commented 6 months ago

Thank you!