Closed bzwheeler closed 1 year ago
For gpt-4 (cl100k_base) the string "1 2 3 4 5" using OpenAI's online tokenizer https://platform.openai.com/tokenizer generates 5 tokens [16, 362, 513, 604, 642] but tiktoken results in 9 tokens [16, 220, 17, 220, 18, 220, 19, 220, 20]
[16, 362, 513, 604, 642]
[16, 220, 17, 220, 18, 220, 19, 220, 20]
https://platform.openai.com/tokenizer doesn't have the cl100k_base tokeniser
For gpt-4 (cl100k_base) the string "1 2 3 4 5" using OpenAI's online tokenizer https://platform.openai.com/tokenizer generates 5 tokens
[16, 362, 513, 604, 642]
but tiktoken results in 9 tokens[16, 220, 17, 220, 18, 220, 19, 220, 20]