zurawiki / tiktoken-rs

Ready-made tokenizer library for working with GPT and tiktoken
MIT License
240 stars 47 forks source link

Add tokenizer prefixes for fine-tuned models #53

Closed jbgriesner closed 8 months ago

jbgriesner commented 8 months ago

The python tiktoken API supports prefixes for fine-tuned models (e.g. here), whereas tiktoken-rs doesn't. It means that methods that require a tokenizer instance from a fine-tuned model name will all fail. This PR simply proposes to support existing fine-tuned model names. Simple unit test is provided as well.

zurawiki commented 8 months ago

PR looks great! Thanks @jbgriesner